数据集:
kilt_tasks
KILT has been built from 11 datasets representing 5 types of tasks:
All these datasets have been grounded in a single pre-processed Wikipedia dump, allowing for fairer and more consistent evaluation as well as enabling new task setups such as multitask and transfer learning with minimal effort. KILT also provides tools to analyze and understand the predictions made by models, as well as the evidence they provide for their predictions.
Loading the KILT knowledge source and task dataThe original KILT release only provides question IDs for the TriviaQA task. Using the full dataset requires mapping those back to the TriviaQA questions, which can be done as follows:
from datasets import load_dataset # Get the pre-processed Wikipedia knowledge source for kild kilt_wiki = load_dataset("kilt_wikipedia") # Get the KILT task datasets kilt_triviaqa = load_dataset("kilt_tasks", name="triviaqa_support_only") # Most tasks in KILT already have all required data, but KILT-TriviaQA # only provides the question IDs, not the questions themselves. # Thankfully, we can get the original TriviaQA data with: trivia_qa = load_dataset('trivia_qa', 'unfiltered.nocontext') # The KILT IDs can then be mapped to the TriviaQA questions with: triviaqa_map = {} def add_missing_data(x, trivia_qa_subset, triviaqa_map): i = triviaqa_map[x['id']] x['input'] = trivia_qa_subset[i]['question'] x['output']['original_answer'] = trivia_qa_subset[i]['answer']['value'] return x for k in ['train', 'validation', 'test']: triviaqa_map = dict([(q_id, i) for i, q_id in enumerate(trivia_qa[k]['question_id'])]) kilt_triviaqa[k] = kilt_triviaqa[k].filter(lambda x: x['id'] in triviaqa_map) kilt_triviaqa[k] = kilt_triviaqa[k].map(add_missing_data, fn_kwargs=dict(trivia_qa_subset=trivia_qa[k], triviaqa_map=triviaqa_map))
The dataset supports a leaderboard that evaluates models against task-specific metrics such as F1 or EM, as well as their ability to retrieve supporting information from Wikipedia.
The current best performing models can be found here .
All tasks are in English ( en ).
An example of open-domain QA from the Natural Questions nq configuration looks as follows:
{'id': '-5004457603684974952', 'input': 'who is playing the halftime show at super bowl 2016', 'meta': {'left_context': '', 'mention': '', 'obj_surface': [], 'partial_evidence': [], 'right_context': '', 'sub_surface': [], 'subj_aliases': [], 'template_questions': []}, 'output': [{'answer': 'Coldplay', 'meta': {'score': 0}, 'provenance': [{'bleu_score': 1.0, 'end_character': 186, 'end_paragraph_id': 1, 'meta': {'annotation_id': '-1', 'evidence_span': [], 'fever_page_id': '', 'fever_sentence_id': -1, 'yes_no_answer': ''}, 'section': 'Section::::Abstract.', 'start_character': 178, 'start_paragraph_id': 1, 'title': 'Super Bowl 50 halftime show', 'wikipedia_id': '45267196'}]}, {'answer': 'Beyoncé', 'meta': {'score': 0}, 'provenance': [{'bleu_score': 1.0, 'end_character': 224, 'end_paragraph_id': 1, 'meta': {'annotation_id': '-1', 'evidence_span': [], 'fever_page_id': '', 'fever_sentence_id': -1, 'yes_no_answer': ''}, 'section': 'Section::::Abstract.', 'start_character': 217, 'start_paragraph_id': 1, 'title': 'Super Bowl 50 halftime show', 'wikipedia_id': '45267196'}]}, {'answer': 'Bruno Mars', 'meta': {'score': 0}, 'provenance': [{'bleu_score': 1.0, 'end_character': 239, 'end_paragraph_id': 1, 'meta': {'annotation_id': '-1', 'evidence_span': [], 'fever_page_id': '', 'fever_sentence_id': -1, 'yes_no_answer': ''}, 'section': 'Section::::Abstract.', 'start_character': 229, 'start_paragraph_id': 1, 'title': 'Super Bowl 50 halftime show', 'wikipedia_id': '45267196'}]}, {'answer': 'Coldplay with special guest performers Beyoncé and Bruno Mars', 'meta': {'score': 0}, 'provenance': []}, {'answer': 'British rock group Coldplay with special guest performers Beyoncé and Bruno Mars', 'meta': {'score': 0}, 'provenance': []}, {'answer': '', 'meta': {'score': 0}, 'provenance': [{'bleu_score': 0.9657992720603943, 'end_character': 341, 'end_paragraph_id': 1, 'meta': {'annotation_id': '2430977867500315580', 'evidence_span': [], 'fever_page_id': '', 'fever_sentence_id': -1, 'yes_no_answer': 'NONE'}, 'section': 'Section::::Abstract.', 'start_character': 0, 'start_paragraph_id': 1, 'title': 'Super Bowl 50 halftime show', 'wikipedia_id': '45267196'}]}, {'answer': '', 'meta': {'score': 0}, 'provenance': [{'bleu_score': -1.0, 'end_character': -1, 'end_paragraph_id': 1, 'meta': {'annotation_id': '-1', 'evidence_span': ['It was headlined by the British rock group Coldplay with special guest performers Beyoncé and Bruno Mars', 'It was headlined by the British rock group Coldplay with special guest performers Beyoncé and Bruno Mars, who previously had headlined the Super Bowl XLVII and Super Bowl XLVIII halftime shows, respectively.', "The Super Bowl 50 Halftime Show took place on February 7, 2016, at Levi's Stadium in Santa Clara, California as part of Super Bowl 50. It was headlined by the British rock group Coldplay with special guest performers Beyoncé and Bruno Mars", "The Super Bowl 50 Halftime Show took place on February 7, 2016, at Levi's Stadium in Santa Clara, California as part of Super Bowl 50. It was headlined by the British rock group Coldplay with special guest performers Beyoncé and Bruno Mars,"], 'fever_page_id': '', 'fever_sentence_id': -1, 'yes_no_answer': ''}, 'section': 'Section::::Abstract.', 'start_character': -1, 'start_paragraph_id': 1, 'title': 'Super Bowl 50 halftime show', 'wikipedia_id': '45267196'}]}]}
Examples from all configurations have the following features:
The configurations have the following splits:
Train | Validation | Test | |
---|---|---|---|
triviaqa | 61844 | 5359 | 6586 |
fever | 104966 | 10444 | 10100 |
aidayago2 | 18395 | 4784 | 4463 |
wned | 3396 | 3376 | |
cweb | 5599 | 5543 | |
trex | 2284168 | 5000 | 5000 |
structured_zeroshot | 147909 | 3724 | 4966 |
nq | 87372 | 2837 | 1444 |
hotpotqa | 88869 | 5600 | 5569 |
eli5 | 272634 | 1507 | 600 |
wow | 94577 | 3058 | 2944 |
[Needs More Information]
[Needs More Information]
Who are the source language producers?[Needs More Information]
[Needs More Information]
Who are the annotators?[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
Cite as:
@inproceedings{kilt_tasks, author = {Fabio Petroni and Aleksandra Piktus and Angela Fan and Patrick S. H. Lewis and Majid Yazdani and Nicola De Cao and James Thorne and Yacine Jernite and Vladimir Karpukhin and Jean Maillard and Vassilis Plachouras and Tim Rockt{\"{a}}schel and Sebastian Riedel}, editor = {Kristina Toutanova and Anna Rumshisky and Luke Zettlemoyer and Dilek Hakkani{-}T{\"{u}}r and Iz Beltagy and Steven Bethard and Ryan Cotterell and Tanmoy Chakraborty and Yichao Zhou}, title = {{KILT:} a Benchmark for Knowledge Intensive Language Tasks}, booktitle = {Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, {NAACL-HLT} 2021, Online, June 6-11, 2021}, pages = {2523--2544}, publisher = {Association for Computational Linguistics}, year = {2021}, url = {https://www.aclweb.org/anthology/2021.naacl-main.200/} }