数据集:

allenai/multinews_sparse_max

语言:

en

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

expert-generated

批注创建人:

expert-generated

源数据集:

original

许可:

other
中文

This is a copy of the Multi-News dataset, except the input source documents of its test split have been replaced by a sparse retriever. The retrieval pipeline used:

  • query : The summary field of each example
  • corpus : The union of all documents in the train , validation and test splits
  • retriever : BM25 via PyTerrier with default settings
  • top-k strategy : "max" , i.e. the number of documents retrieved, k , is set as the maximum number of documents seen across examples in this dataset, in this case k==10

Retrieval results on the train set:

Recall@100 Rprec Precision@k Recall@k
0.8793 0.7460 0.2213 0.8264

Retrieval results on the validation set:

Recall@100 Rprec Precision@k Recall@k
0.8748 0.7453 0.2173 0.8232

Retrieval results on the test set:

Recall@100 Rprec Precision@k Recall@k
0.8775 0.7480 0.2187 0.8250