Longformer base补充在MP-DocVQA上的微调

这是基于SQuAD v1数据集训练的Longformer-base模型，在Multipage DocVQA（MP-DocVQA）数据集上进行了微调。

该模型在 Hierarchical multimodal transformers for Multi-Page DocVQA 中被用作基准。

MP-DocVQA数据集的结果见表2。
训练超参数可在附录D的表8中找到。

使用方法

推理

如何使用此模型在PyTorch中对样本问题和上下文进行推理：

from transformers import LongformerTokenizerFast, LongformerForQuestionAnswering

tokenizer = LongformerTokenizerFast.from_pretrained("rubentito/longformer-base-mpdocvqa")
model = LongformerForQuestionAnswering.from_pretrained("rubentito/longformer-base-mpdocvqa")

text = "Huggingface has democratized NLP. Huge thanks to Huggingface for this."
question = "What has Huggingface done?"

encoding = tokenizer(question, text, return_tensors="pt")
output = model(encoding["input_ids"], attention_mask=encoding["attention_mask"])

start_pos = torch.argmax(output.start_logits, dim=-1).item()
end_pos = torch.argmax(output.end_logits, dim=-1).item()

context_tokens = tokenizer.convert_ids_to_tokens(encoding["input_ids"][0].tolist())
answer_tokens = context_tokens[start_pos: end_pos + 1]
pred_answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens))

评估指标

平均归一化Levenshtein相似度（ANLS）

这是文本型VQA任务（ST-VQA和DocVQA）的标准度量指标。它评估了方法的推理能力，同时对OCR识别错误进行平滑惩罚。详细信息请参见 Scene Text Visual Question Answering 。

答案页预测准确率（APPA）

在MP-DocVQA任务中，模型可以提供包含回答问题所需信息的页码索引。对于这个子任务，使用准确率来评估预测的正确性：即预测的页码是否正确。详细信息请参见 Hierarchical multimodal transformers for Multi-Page DocVQA 。

模型结果

在 Hierarchical multimodal transformers for Multi-Page DocVQA 的表2中可以找到扩展的实验结果。您还可以在 RRC Portal 上查看实时排行榜。

Model	HF name	Parameters	ANLS	APPA
1238321	rubentito/bert-large-mpdocvqa	334M	0.4183	51.6177
1239321	rubentito/longformer-base-mpdocvqa	148M	0.5287	71.1696
12310321	rubentito/bigbird-base-itc-mpdocvqa	131M	0.4929	67.5433
12311321	rubentito/layoutlmv3-base-mpdocvqa	125M	0.4538	51.9426
12312321	rubentito/t5-base-mpdocvqa	223M	0.5050	0.0000
12313321	rubentito/hivt5-base-mpdocvqa	316M	0.6201	79.23

引用信息

@article{tito2022hierarchical,
  title={Hierarchical multimodal transformers for Multi-Page DocVQA},
  author={Tito, Rub{\`e}n and Karatzas, Dimosthenis and Valveny, Ernest},
  journal={arXiv preprint arXiv:2212.05935},
  year={2022}
}

作者:

Rubèn Tito

数据集大小:

567.48 MB