Transformers >= 4.23.1 这个模型依赖于一个自定义的建模文件,您需要添加trust_remote_code=True来使用它 请参阅 #13467
LSG ArXiv paper . Github / 转换脚本可在此处 link 找到。
这个模型是从 BART-base 中为编码器-解码器任务进行了适应,并且没有额外的预训练。它使用了相同数量的参数/层和相同的分词器。
这个模型可以处理长序列,但比长序列模型(LED)或大鸟模型(Pegasus)更快,更高效,并且依赖于局部+稀疏+全局注意力(LSG)。
该模型的序列长度必须是块大小的倍数。该模型是“自适应的”,如果需要,会自动填充序列(在配置中设置adaptive=True)。然而,建议使用分词器截断输入(truncation=True),并且可以选择以块大小的倍数进行填充(pad_to_multiple_of=...)。\
使用PyTorch进行实现。
该模型依赖于一个自定义的建模文件,您需要添加trust_remote_code=True来使用它。
from transformers import AutoModel, AutoTokenizer model = AutoModel.from_pretrained("ccdv/lsg-bart-base-16384", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-16384")
您可以更改各种参数:
默认参数在实践中效果很好。如果内存不足,请减小块大小,增加稀疏因子,并删除注意力评分矩阵中的dropout。
from transformers import AutoModel model = AutoModel.from_pretrained("ccdv/lsg-bart-base-16384", trust_remote_code=True, num_global_tokens=16, block_size=64, sparse_block_size=64, attention_probs_dropout_prob=0.0 sparsity_factor=4, sparsity_type="none", mask_first_token=True )
有5种不同的稀疏选择模式。最佳类型取决于任务。请注意,对于长度<2*block_size的序列,类型没有效果。
摘要的Seq2Seq示例:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-base-16384", trust_remote_code=True, pass_global_tokens_to_decoder=True, # Pass encoder global tokens to decoder ) tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-16384") SENTENCE = "This is a test sequence to test the model. " * 300 token_ids = tokenizer( SENTENCE, return_tensors="pt", padding="max_length", # Optional but recommended truncation=True # Optional but recommended ) output = model(**token_ids)
分类示例:
from transformers import AutoModelForSequenceClassification, AutoTokenizer model = AutoModelForSequenceClassification.from_pretrained("ccdv/lsg-bart-base-16384", trust_remote_code=True, pass_global_tokens_to_decoder=True, # Pass encoder global tokens to decoder ) tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-16384") SENTENCE = "This is a test sequence to test the model. " * 300 token_ids = tokenizer( SENTENCE, return_tensors="pt", #pad_to_multiple_of=... # Optional truncation=True ) output = model(**token_ids) > SequenceClassifierOutput(loss=None, logits=tensor([[-0.3051, -0.1762]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)
BART
@article{DBLP:journals/corr/abs-1910-13461, author = {Mike Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and Abdelrahman Mohamed and Omer Levy and Veselin Stoyanov and Luke Zettlemoyer}, title = {{BART:} Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension}, journal = {CoRR}, volume = {abs/1910.13461}, year = {2019}, url = {http://arxiv.org/abs/1910.13461}, eprinttype = {arXiv}, eprint = {1910.13461}, timestamp = {Thu, 31 Oct 2019 14:02:26 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-1910-13461.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }