英文

BigBirdPegasus模型(大型)

BigBird是一种基于稀疏注意力的Transformer,它将基于Transformer的模型(如BERT)扩展到更长的序列。此外,BigBird对稀疏模型可以处理的完整Transformer的能力有着理论理解。

BigBird在这篇 paper 中被介绍并首次发布在这篇 repository

免责声明:发布BigBird的团队没有为该模型编写模型卡片,所以这个模型卡片是由Hugging Face团队编写的。

模型描述

BigBird依赖于块稀疏注意力而不是常规注意力(即BERT的注意力),可以处理长度为4096的序列,与BERT相比计算成本要低得多。它在涉及非常长的序列的各种任务上已经取得了SOTA的成绩,如长文档摘要、长上下文的问答。

如何使用

以下是如何在PyTorch中使用此模型获取给定文本的特征:

from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-pubmed")

# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed")

# decoder attention type can't be changed & will be "original_full"
# you can change `attention_type` (encoder only) to full attention like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", attention_type="original_full")

# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", block_size=16, num_random_blocks=2)

text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)

训练过程

此检查点是在pubmed数据集上将BigBirdPegasusForConditionalGeneration微调为摘要任务后获得的 scientific_papers

BibTeX条目和引用信息

@misc{zaheer2021big,
      title={Big Bird: Transformers for Longer Sequences}, 
      author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
      year={2021},
      eprint={2007.14062},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}