数据集:

castorini/msmarco_v1_passage_doc2query-t5_expansions

英文

数据集概述

该仓库提供了针对MS MARCO V1段落语料库使用docTTTTTquery(有时写为docT5query或doc2query-T5)生成的查询。docTTTTTquery是doc2query文档扩展模型系列的最新版本。其基本思想是训练一个模型,该模型在给定输入文档时生成可能回答文档的问题(或更广泛地说,文档可能相关的查询)。然后,将这些预测的问题(或查询)附加到原始文档中,然后像以前一样对其进行索引。docTTTTTquery模型因使用T5作为扩展模型而得名。

数据集结构

所有三个折(训练集、开发集和测试集)共享相同的语料库。这些查询是从该语料库生成的。

数据条目示例如下:

{  
    "id": "0", 
    "predicted_queries": ["what was important to the success of the manhattan project", "why was the manhattan project important?", "what was important about the manhattan project", "why was the success of the manhattan project so important?", "who was the manhattan project a scientific project for", "what was the manhattan project important for", "why was the manhattan project a success", "how was the success of the manhattan project", "why was the manhattan project important to the success of the project?", "what is the importance of communication amongst scientific minds", "what was the importance of scientific communication for the success of the manhattan project", "what was the purpose of the manhattan project", "why was the manhattan project significant?", "why was the manhattan project important", "why did scientists believe in atomic power", "why did scientists and engineers have to communicate?", "why was the manhattan project a success", "what was the purpose of the manhattan project", "why did scientists and engineers want to be involved in the manhattan project", "why are the scientists so valuable", "which of the following was an important outcome of the manhattan project?", "why was the manhattan project successful", "why was the manhattan project an important scientific achievement", "what was the success of manhattan", "what was the result of the manhattan project", "why was communications important to the success of the manhattan project?", "why the manhattan project was important", "why is it important to know who is the manhattan project", "what was the most important accomplishment to the success of the manhattan project?", "why was the manhattan project an important achievement?", "why was the manhattan project important to the success of the atomic bomb", "how did the manhattan project impact scientists?", "what were the effects of the manhattan project", "what were the results of the manhattan project and how did they affect the public", "what was the manhattan project", "why did scientists contribute to the success of the manhattan project", "why was communication important in the manhattan project", "what was the effect of the manhattan project on the world", "what was the importance of communication in the success of the manhattan project?", "why was communications important to the success of the manhattan project?", "why was the manhattan project important", "what was the manhattan project", "why was the success of the manhattan project important", "why was manhattan project a success", "what was important about the manhattan project", "what benefited from the success of the new york nuclear bomb", "what was the significance to the success of the manhattan project?", "why is communication important", "why was the manhattan project an important achievement", "why did the manhattan project work", "what was the manhattan project's success", "what was the significance of the manhattan experiment", "how important was communication to the success of the manhattan project", "why is communication important to the success of the manhattan project?", "what was the importance of the manhattan project", "why did scientists believe the manhattan project had the greatest impact on science?", "what was a critical effect of the manhattan project?", "why did the manhattan project succeed", "what was the importance of the manhattan project", "why was the manhattan project important", "why was the manhattan project a success?", "what was the importance of communication and communication during the manhattan project", "why was the manhattan project significant?", "what was the importance of communication in the manhattan project?", "why was communication important to the success of the manhattan project?", "why was the manhattan project an important achievement", "what was important about the manhattan project", "why was the manhattan project a success", "why were the scientists at the manhattan project so successful?", "why did the manhattan project really work", "what was the success of the manhattan project", "what is the importance of communication during the manhattan project", "why was the manhattan project important", "why was communication important?", "what was the importance of communication in the success of the manhattan project?", "why was the manhattan project successful?", "which statement reflects the success of the manhattan project?", "why did the manhattan project succeed", "why was the manhattan project a great success", "why was the manhattan project important"]
}

加载数据集

加载数据集的示例:

dataset = load_dataset('castorini/msmarco_v1_passage_doc2query-t5_expansions', data_files='d2q.jsonl.gz')

引文信息

@article{docTTTTTquery,
  title={From doc2query to {docTTTTTquery}},
  author={Nogueira, Rodrigo and Lin, Jimmy},
  year={2019}
}

@article{emdt5,
   author={Ronak Pradeep and Rodrigo Nogueira and Jimmy Lin},
   title={The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models},
   journal={arXiv:2101.05667},
   year={2021},
}