数据集:
khalidalt/tydiqa-primary
任务:
问答子任务:
extractive-qa计算机处理:
multilingual语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
extended|wikipedia许可:
apache-2.0TyDi QA 是一个包含 11 种语言的问题回答数据集,包含 20.4 万个问题-回答对。TyDi QA 中的语言在其语言类型上是多样化的 - 每种语言表达的语言特征集 - 因此我们期望在此数据集上表现良好的模型能够推广到世界上很多语言。它包含了在仅英语语料库中找不到的语言现象。为了提供一个真实的信息搜索任务并避免启动效应,问题是由想要知道答案但尚不知道答案的人编写的(与 SQuAD 及其子数据集不同),并且数据是直接在每种语言中收集的,而不使用翻译(与 MLQA 和 XQuAD 不同)。
'验证' 的一个示例如下所示。
This example was too long and was cropped: { "annotations": { "minimal_answers_end_byte": [-1, -1, -1], "minimal_answers_start_byte": [-1, -1, -1], "passage_answer_candidate_index": [-1, -1, -1], "yes_no_answer": ["NONE", "NONE", "NONE"] }, "document_plaintext": "\"\\nรองศาสตราจารย์[1] หม่อมราชวงศ์สุขุมพันธุ์ บริพัตร (22 กันยายน 2495 -) ผู้ว่าราชการกรุงเทพมหานครคนที่ 15 อดีตรองหัวหน้าพรรคปร...", "document_title": "หม่อมราชวงศ์สุขุมพันธุ์ บริพัตร", "document_url": "\"https://th.wikipedia.org/wiki/%E0%B8%AB%E0%B8%A1%E0%B9%88%E0%B8%AD%E0%B8%A1%E0%B8%A3%E0%B8%B2%E0%B8%8A%E0%B8%A7%E0%B8%87%E0%B8%...", "language": "thai", "passage_answer_candidates": "{\"plaintext_end_byte\": [494, 1779, 2931, 3904, 4506, 5588, 6383, 7122, 8224, 9375, 10473, 12563, 15134, 17765, 19863, 21902, 229...", "question_text": "\"หม่อมราชวงศ์สุขุมพันธุ์ บริพัตร เรียนจบจากที่ไหน ?\"..." }
所有拆分的数据字段相同。
主要任务name | train | validation |
---|---|---|
primary_task | 166916 | 18670 |
@article{tydiqa, title = {TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages}, author = {Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki} year = {2020}, journal = {Transactions of the Association for Computational Linguistics} }
@inproceedings{ruder-etal-2021-xtreme, title = "{XTREME}-{R}: Towards More Challenging and Nuanced Multilingual Evaluation", author = "Ruder, Sebastian and Constant, Noah and Botha, Jan and Siddhant, Aditya and Firat, Orhan and Fu, Jinlan and Liu, Pengfei and Hu, Junjie and Garrette, Dan and Neubig, Graham and Johnson, Melvin", booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2021", address = "Online and Punta Cana, Dominican Republic", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.emnlp-main.802", doi = "10.18653/v1/2021.emnlp-main.802", pages = "10215--10245", } }