模型:

google/matcha-chart2text-pew

任务:

视觉问答

类库:

PyTorch Transformers

语言:

其他:

pix2struct 文生文 AutoTrain Compatible

预印本库:

arxiv:2212.09662

许可:

apache-2.0

模型介绍文件清单

英文

MatCha - 在Chart2text-pew上微调的模型卡片

该模型是在Chart2text-pew数据集上微调的MatCha模型。这个微调模型可能更适合图表摘要任务。

TL;DR

该论文的摘要如下所述：

视觉语言数据（如图表和信息图）在人类世界中无处不在。然而，目前最先进的视觉语言模型在这些数据上的表现并不好。我们提出了MATCHA（Math reasoning and Chart derendering pretraining）来增强视觉语言模型同时建模图表/图形和语言数据的能力。具体而言，我们提出了几个包括图表分解和数值推理的预训练任务，这是视觉语言建模中的关键能力。我们在Pix2Struct的基础上进行MATCHA预训练，Pix2Struct是最近提出的一种图像到文本的视觉语言模型。在PlotQA和ChartQA等标准基准测试中，MATCHA模型的性能超过了最先进方法近20%。我们还研究了MATCHA预训练在屏幕截图、教科书图表和文档图表等领域的转移能力，并观察到整体改进，验证了MATCHA预训练在更广泛的视觉语言任务中的有用性。

使用模型

从T5x转换为huggingface

您可以按照以下方式使用该脚本：

python convert_pix2struct_checkpoint_to_pytorch.py --t5x_checkpoint_path PATH_TO_T5X_CHECKPOINTS --pytorch_dump_path PATH_TO_SAVE --is_vqa

如果要转换一个大模型，请运行：

python convert_pix2struct_checkpoint_to_pytorch.py --t5x_checkpoint_path PATH_TO_T5X_CHECKPOINTS --pytorch_dump_path PATH_TO_SAVE --use-large --is_vqa

保存后，您可以使用以下代码片段推送转换后的模型：

from transformers import Pix2StructForConditionalGeneration, Pix2StructProcessor

model = Pix2StructForConditionalGeneration.from_pretrained(PATH_TO_SAVE)
processor = Pix2StructProcessor.from_pretrained(PATH_TO_SAVE)

model.push_to_hub("USERNAME/MODEL_NAME")
processor.push_to_hub("USERNAME/MODEL_NAME")

运行预测

要运行预测，请参考 instructions presented in the matcha-chartqa model card 。

贡献

此模型最初由Fangyu Liu、Francesco Piccinno等人贡献，并由 Younes Belkada 加入了Hugging Face生态系统。

引用

如果您想引用这项工作，请考虑引用原始论文：

@misc{liu2022matcha,
      title={MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering}, 
      author={Fangyu Liu and Francesco Piccinno and Syrine Krichene and Chenxi Pang and Kenton Lee and Mandar Joshi and Yasemin Altun and Nigel Collier and Julian Martin Eisenschlos},
      year={2022},
      eprint={2212.09662},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

作者:

Google AI

数据集大小:

1.06 GB

MatCha - 在Chart2text-pew上微调的模型卡片

目录

TL;DR

使用模型

从T5x转换为huggingface

运行预测

贡献

引用