BigBirdPegasus模型（大型）

BigBird是一种基于稀疏注意力的Transformer模型，扩展了Transformer-based模型，如BERT，以处理更长的序列。此外，BigBird还具备了对稀疏模型能够处理的完整Transformer能力的理论理解。

BigBird在 paper 中被介绍，并在 repository 中首次发布。

免责声明：发布BigBird的团队未为该模型编写模型卡片，因此该模型卡片由Hugging Face团队编写。

模型描述

BigBird依赖于块稀疏注意力，而不是常规注意力（即BERT的注意力），可以处理长度多达4096的序列，并且与BERT相比，计算成本要低得多。它在涉及非常长序列的各种任务上都取得了SOTA，例如长文档摘要、带有长上下文的问答等。

如何使用

以下是如何使用此模型在PyTorch中获取给定文本的特征：

from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-arxiv")

# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv")

# decoder attention type can't be changed & will be "original_full"
# you can change `attention_type` (encoder only) to full attention like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv", attention_type="original_full")

# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv", block_size=16, num_random_blocks=2)

text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)

训练过程

此检查点是在 scientific_papers 上对BigBirdPegasusForConditionalGeneration进行摘要训练而获得的。

BibTeX条目和引用信息

@misc{zaheer2021big,
      title={Big Bird: Transformers for Longer Sequences}, 
      author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
      year={2021},
      eprint={2007.14062},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

作者:

Google AI

数据集大小:

2.15 GB