数据集:

multi_news

语言:

en

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

expert-generated

批注创建人:

expert-generated

源数据集:

original

预印本库:

arxiv:1906.01749

许可:

other
英文

Multi-News 数据集卡

数据集概述

Multi-News 数据集包含来自 newser.com 网站的新闻文章和人工撰写的摘要。每个摘要都由编辑专业撰写,并包含引用原始文章的链接。

有两个属性:

  • document: 由特殊标记"|||||"分隔的新闻文章文本。
  • summary: 新闻摘要。

支持的任务和排行榜

More Information Needed

语言

More Information Needed

数据集结构

数据实例

默认
  • 下载的数据集文件大小: 256.96 MB
  • 生成的数据集大小: 700.18 MB
  • 总共使用的磁盘空间: 957.14 MB

'验证'的一个示例如下所示。

{
    "document": "some line val \n another line",
    "summary": "target val line"
}

数据字段

数据字段在所有拆分中都是相同的。

默认
  • document: 字符串属性。
  • summary: 字符串属性。

数据拆分

name train validation test
default 44972 5622 5622

数据集创建

策划原理

More Information Needed

数据源

初始数据收集和规范化

More Information Needed

资源语言生成者是谁?

More Information Needed

注释

注释过程

More Information Needed

注释者是谁?

More Information Needed

个人和敏感信息

More Information Needed

使用数据的注意事项

数据的社会影响

More Information Needed

偏见讨论

More Information Needed

其他已知限制

More Information Needed

其他信息

数据集策划人员

More Information Needed

许可信息

This Dataset Usage Agreement ("Agreement") is a legal agreement with LILY LAB for the Dataset made available to the individual or entity ("Researcher") exercising rights under this Agreement. "Dataset" includes all text, data, information, source code, and any related materials, documentation, files, media, updates or revisions.

The Dataset is intended for non-commercial research and educational purposes only, and is made available free of charge without extending any license or other intellectual property rights. By downloading or using the Dataset, the Researcher acknowledges that they agree to the terms in this Agreement, and represent and warrant that they have authority to do so on behalf of any entity exercising rights under this Agreement. The Researcher accepts and agrees to be bound by the terms and conditions of this Agreement. If the Researcher does not agree to this Agreement, they may not download or use the Dataset.

By sharing content with m, such as by submitting content to this site or by corresponding with LILY LAB contributors, the Researcher grants LILY LAB the right to use, reproduce, display, perform, adapt, modify, distribute, have distributed, and promote the content in any form, anywhere and for any purpose, such as for evaluating and comparing summarization systems. Nothing in this Agreement shall obligate LILY LAB to provide any support for the Dataset. Any feedback, suggestions, ideas, comments, improvements given by the Researcher related to the Dataset is voluntarily given, and may be used by LILY LAB without obligation or restriction of any kind.

The Researcher accepts full responsibility for their use of the Dataset and shall defend indemnify, and hold harmless m, including their employees, trustees, officers, and agents, against any and all claims arising from the Researcher's use of the Dataset. The Researcher agrees to comply with all laws and regulations as they relate to access to and use of the Dataset and Service including U.S. export jurisdiction and other U.S. and international regulations.

THE DATASET IS PROVIDED "AS IS." LILY LAB DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. WITHOUT LIMITATION OF THE ABOVE, LILY LAB DISCLAIMS ANY WARRANTY THAT DATASET IS BUG OR ERROR-FREE, AND GRANTS NO WARRANTY REGARDING ITS USE OR THE RESULTS THEREFROM INCLUDING, WITHOUT LIMITATION, ITS CORRECTNESS, ACCURACY, OR RELIABILITY. THE DATASET IS NOT WARRANTIED TO FULFILL ANY PARTICULAR PURPOSES OR NEEDS.

TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT SHALL LILY LAB BE LIABLE FOR ANY LOSS, DAMAGE OR INJURY, DIRECT AND INDIRECT, INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER FOR BREACH OF CONTRACT, TORT (INCLUDING NEGLIGENCE) OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, INCLUDING BUT NOT LIMITED TO LOSS OF PROFITS, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THESE LIMITATIONS SHALL APPLY NOTWITHSTANDING ANY FAILURE OF ESSENTIAL PURPOSE OF ANY LIMITED REMEDY.

This Agreement is effective until terminated. LILY LAB reserves the right to terminate the Researcher's access to the Dataset at any time. If the Researcher breaches this Agreement, the Researcher's rights to use the Dataset shall terminate automatically. The Researcher will immediately cease all use and distribution of the Dataset and destroy any copies or portions of the Dataset in their possession.

This Agreement is governed by the laws of the SOME_PLACE, without regard to conflict of law principles. All terms and provisions of this Agreement shall, if possible, be construed in a manner which makes them valid, but in the event any term or provision of this Agreement is found by a court of competent jurisdiction to be illegal or unenforceable, the validity or enforceability of the remainder of this Agreement shall not be affected.

This Agreement is the complete and exclusive agreement between the parties with respect to its subject matter and supersedes all prior or contemporaneous oral or written agreements or understandings relating to the subject matter.

引用信息

@misc{alex2019multinews,
    title={Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model},
    author={Alexander R. Fabbri and Irene Li and Tianwei She and Suyi Li and Dragomir R. Radev},
    year={2019},
    eprint={1906.01749},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

贡献

感谢 @patrickvonplaten @lewtun @thomwolf 添加此数据集。