数据集:

Salesforce/rose

语言:

en
英文

ROSE ?

This repo contiains the RoSE benchmark of our paper "Revisiting the Gold Standard:Grounding Summarization Evaluation with Robust Human Evaluation".

Please visit here for a demo page of this project.

ACU Annotations

RoSE benchmark contains system outputs annotated with our ACU protocol. It contains four parts:

  • CNNDM, test set annotations
  • CNNDM, validation set annotations
  • XSum, test set annotations
  • SamSum, test set annotations

We summarize the statistics below.

Dataset Split #Doc. #Sys. #Total Summ. HF Name
CNNDM Test 500 12 6000 cnndm_test
CNNDM Validation 1000 8 8000 cnndm_validation
XSum Test 500 8 4000 xsum
SamSum Test 500 8 4000 samsum

Human Annotations with Different Evaluation Protocols

We have system outputs annotated with four different human evaluation protocols in total.We summarize them below.

Protocol w/ Input Document w/ Reference Summary Fine-grained
Prior
Ref-free
Ref-based
ACU

We annotated two sets of system summaries.

  • Summaries of 12 fine-tuned systems. The huggingface data split name is cnndm_protocol .
  • Zero-shot summaries from large langauge models (GPT3, T0), together with summaries from BRIO and BART. The huggingface data split name is cnndm_protocol_gpt3 .
  • ROSE ?

    本存储库包含了我们论文《重新审视黄金标准: 通过强大的人工评估来支持摘要评估》中的RoSE基准测试。

    请访问 here 以查看此项目的演示页面。

    ACU注释

    RoSE基准测试包含使用我们的ACU协议注释的系统输出。它包含四个部分:

    • CNNDM测试集注释
    • CNNDM验证集注释
    • XSum测试集注释
    • SamSum测试集注释

    我们总结如下统计数据。

    Dataset Split #Doc. #Sys. #Total Summ. HF Name
    CNNDM Test 500 12 6000 cnndm_test
    CNNDM Validation 1000 8 8000 cnndm_validation
    XSum Test 500 8 4000 xsum
    SamSum Test 500 8 4000 samsum

    使用不同评估协议的人工注释

    我们总共对系统输出进行了四种不同的人工评估协议的注释。我们总结如下。

    Protocol w/ Input Document w/ Reference Summary Fine-grained
    Prior
    Ref-free
    Ref-based
    ACU

    我们对两组系统摘要进行了注释。

  • 12个经过精调的系统的摘要。Huggingface数据拆分名称为cnndm_protocol。
  • 来自大型语言模型(GPT3、T0)的零-shot摘要,以及来自BRIO和BART的摘要。Huggingface数据拆分名称为cnndm_protocol_gpt3。