英文

TAPEX (基础模型)

TAPEX是由Qian Liu,Bei Chen,Jiaqi Guo,Morteza Ziyadi,Zeqi Lin,Weizhu Chen,Jian-Guang Lou于 TAPEX: Table Pre-training via Learning a Neural SQL Executor 提出的。原始repo可以在 here 中找到。

模型描述

TAPEX(Table Pre-training via Execution)是一种概念简单而实证强大的预训练方法,可以为现有模型赋予表格推理能力。TAPEX通过学习一个神经SQL执行器来实现表格预训练,该执行器是通过自动生成可执行的SQL查询来获得的一个合成语料库。

TAPEX基于BART架构,即具有双向(类似BERT的)编码器和自回归(类似GPT的)解码器的变压器编码器-编码器(seq2seq)模型。

这个模型是在 WikiTableQuestions 数据集上微调的tapex-base模型。

预期用途

您可以使用该模型来回答关于复杂问题的表格问题。下面显示了一些可解答的问题(对应的表格未显示):

Question Answer
according to the table, what is the last title that spicy horse produced? Akaneiro: Demon Hunters
what is the difference in runners-up from coleraine academical institution and royal school dungannon? 20
what were the first and last movies greenstreet acted in? The Maltese Falcon, Malaya
in which olympic games did arasay thondike not finish in the top 20? 2012
which broadcaster hosted 3 titles but they had only 1 episode? Channel 4

如何使用

以下是在transformers中使用此模型的方法:

from transformers import TapexTokenizer, BartForConditionalGeneration
import pandas as pd

tokenizer = TapexTokenizer.from_pretrained("microsoft/tapex-base-finetuned-wtq")
model = BartForConditionalGeneration.from_pretrained("microsoft/tapex-base-finetuned-wtq")

data = {
    "year": [1896, 1900, 1904, 2004, 2008, 2012],
    "city": ["athens", "paris", "st. louis", "athens", "beijing", "london"]
}
table = pd.DataFrame.from_dict(data)

# tapex accepts uncased input since it is pre-trained on the uncased corpus
query = "In which year did beijing host the Olympic Games?"
encoding = tokenizer(table=table, query=query, return_tensors="pt")

outputs = model.generate(**encoding)

print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
# [' 2008.0']

如何评估

请查找评估脚本 here

BibTeX条目和引用信息

@inproceedings{
    liu2022tapex,
    title={{TAPEX}: Table Pre-training via Learning a Neural {SQL} Executor},
    author={Qian Liu and Bei Chen and Jiaqi Guo and Morteza Ziyadi and Zeqi Lin and Weizhu Chen and Jian-Guang Lou},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=O50443AsCP}
}