Transformers >= 4.23.1 This model relies on a custom modeling file, you need to add trust_remote_code=True See #13467
LSG ArXiv paper . Github/conversion script is available at this link .
This model is adapted from XLM-RoBERTa-base model without additional pretraining yet. It uses the same number of parameters/layers and the same tokenizer.
This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...).
Support encoder-decoder but I didnt test it extensively. Implemented in PyTorch.
The model relies on a custom modeling file, you need to add trust_remote_code=True to use it.
from transformers import AutoModel, AutoTokenizer model = AutoModel.from_pretrained("ccdv/lsg-xlm-roberta-base-4096", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-xlm-roberta-base-4096")
You can change various parameters like :
Default parameters work well in practice. If you are short on memory, reduce block sizes, increase sparsity factor and remove dropout in the attention score matrix.
from transformers import AutoModel model = AutoModel.from_pretrained("ccdv/lsg-xlm-roberta-base-4096", trust_remote_code=True, num_global_tokens=16, block_size=64, sparse_block_size=64, attention_probs_dropout_prob=0.0 sparsity_factor=4, sparsity_type="none", mask_first_token=True )
There are 5 different sparse selection patterns. The best type is task dependent. Note that for sequences with length < 2*block_size, the type has no effect.
Fill mask example:
from transformers import FillMaskPipeline, AutoModelForMaskedLM, AutoTokenizer model = AutoModelForMaskedLM.from_pretrained("ccdv/lsg-xlm-roberta-base-4096", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-xlm-roberta-base-4096") SENTENCES = ["Paris is the <mask> of France."] pipeline = FillMaskPipeline(model, tokenizer) output = pipeline(SENTENCES, top_k=1) output = [o[0]["sequence"] for o in output] > ['Paris is the capital of France.']
Classification example:
from transformers import AutoModelForSequenceClassification, AutoTokenizer model = AutoModelForSequenceClassification.from_pretrained("ccdv/lsg-xlm-roberta-base-4096", trust_remote_code=True, pool_with_global=True, # pool with a global token instead of first token ) tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-xlm-roberta-base-4096") SENTENCE = "This is a test for sequence classification. " * 300 token_ids = tokenizer( SENTENCE, return_tensors="pt", #pad_to_multiple_of=... # Optional truncation=True ) output = model(**token_ids) > SequenceClassifierOutput(loss=None, logits=tensor([[-0.3051, -0.1762]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)
To train global tokens and the classification head only:
from transformers import AutoModelForSequenceClassification, AutoTokenizer model = AutoModelForSequenceClassification.from_pretrained("ccdv/lsg-xlm-roberta-base-4096", trust_remote_code=True, pool_with_global=True, # pool with a global token instead of first token num_global_tokens=16 ) tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-xlm-roberta-base-4096") for name, param in model.named_parameters(): if "global_embeddings" not in name: param.requires_grad = False else: param.required_grad = True
XLM-RoBERTa
@article{DBLP:journals/corr/abs-2105-00572, author = {Naman Goyal and Jingfei Du and Myle Ott and Giri Anantharaman and Alexis Conneau}, title = {Larger-Scale Transformers for Multilingual Masked Language Modeling}, journal = {CoRR}, volume = {abs/2105.00572}, year = {2021}, url = {https://arxiv.org/abs/2105.00572}, eprinttype = {arXiv}, eprint = {2105.00572}, timestamp = {Wed, 12 May 2021 15:54:31 +0200}, biburl = {https://dblp.org/rec/journals/corr/abs-2105-00572.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }