模型:

google/bigbird-base-trivia-itc

英文

BigBird基础知识问答模型

此模型是在bigbird-roberta-base的基础上进行微调的,使用BigBirdForQuestionAnsweringHead作为顶层。

查看 this ,以了解google/bigbird-base-trivia-itc在问答方面的表现。

如何使用

以下是在PyTorch中使用此模型获取给定文本特征的方法:

from transformers import BigBirdForQuestionAnswering

# by default its in `block_sparse` mode with num_random_blocks=3, block_size=64
model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc")

# you can change `attention_type` to full attention like this:
model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc", attention_type="original_full")

# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdForQuestionAnswering.from_pretrained("google/bigbird-base-trivia-itc", block_size=16, num_random_blocks=2)

question = "Replace me by any text you'd like."
context = "Put some context for answering"
encoded_input = tokenizer(question, context, return_tensors='pt')
output = model(**encoded_input)

微调配置和超参数

  • 全局令牌数量=128
  • 窗口长度=192
  • 随机令牌数量=192
  • 最大序列长度=4096
  • 头数=12
  • 隐藏层数量=12
  • 隐藏层大小=768
  • 批次大小=32
  • 损失=交叉熵噪声跨度

BibTeX条目和引用信息

@misc{zaheer2021big,
      title={Big Bird: Transformers for Longer Sequences}, 
      author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
      year={2021},
      eprint={2007.14062},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}