模型:

google/bigbird-pegasus-large-bigpatent

英文

BigBirdPegasus 模型(大型)

BigBird 是一种基于稀疏注意力的转换器,扩展了基于Transformer的模型,如BERT,以处理更长的序列。此外,BigBird对完整Transformer的能力有着理论上的理解,稀疏模型能够处理这些能力。

BigBird 在此 paper 中被介绍并首次发布在此 repository

声明:发布BigBird团队并未为此模型编写模型卡片,因此此模型卡片由Hugging Face团队撰写。

模型描述

BigBird 依赖于块稀疏注意力而不是正常的注意力(即BERT的注意力),可以处理长度为4096的序列,并且与BERT相比,计算成本要低得多。它在涉及非常长序列(如长文档摘要、长上下文的问答)的各种任务中取得了SOTA的效果。

如何使用

以下是如何在PyTorch中使用此模型获取给定文本的特征的方法:

from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-bigpatent")

# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-bigpatent")

# decoder attention type can't be changed & will be "original_full"
# you can change `attention_type` (encoder only) to full attention like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-bigpatent", attention_type="original_full")

# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-bigpatent", block_size=16, num_random_blocks=2)

text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)

训练过程

此检查点是在 big_patent 数据集上将 BigBirdPegasusForConditionalGeneration 用于摘要任务进行微调后获得的。

BibTeX条目和引用信息

@misc{zaheer2021big,
      title={Big Bird: Transformers for Longer Sequences}, 
      author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
      year={2021},
      eprint={2007.14062},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}