数据集:

allenai/multinews_dense_oracle

语言:

en

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

expert-generated

批注创建人:

expert-generated

源数据集:

original

许可:

other
中文

This is a copy of the Multi-News dataset, except the input source documents of the train , validation , and test splits have been replaced by a dense retriever. The retrieval pipeline used:

  • query : The summary field of each example
  • corpus : The union of all documents in the train , validation and test splits
  • retriever : facebook/contriever-msmarco via PyTerrier with default settings
  • top-k strategy : "oracle" , i.e. the number of documents retrieved, k , is set as the original number of input documents for each example

Retrieval results on the train set:

Recall@100 Rprec Precision@k Recall@k
0.8661 0.6867 0.6867 0.6867

Retrieval results on the validation set:

Recall@100 Rprec Precision@k Recall@k
0.8626 0.6859 0.6859 0.6859

Retrieval results on the test set:

Recall@100 Rprec Precision@k Recall@k
0.8625 0.6927 0.6927 0.6927