数据集:

cakiki/args_me

英文

args.me语料库数据卡片

数据集概述

args.me语料库(版本1.0,经过清理)包含382,545个论点,这些论点是从2019年中期的四个辩论门户网站上爬取得到的。这些辩论门户网站是Debatewise、IDebate.org、Debatepedia和Debate.org。论点是使用专为每个辩论门户网站设计的启发式方法提取的。

数据集用途

import datasets
args = datasets.load_dataset('cakiki/args_me', 'corpus', streaming=True)
args_iterator = iter(args)
for arg in args_iterator:
    print(args['conclusion'])
    print(args['id'])
    print(args['argument'])
    print(args['stance'])
    break

支持的任务和排行榜

文档检索,有争议问题的论点检索

语言

args.me语料库是单语言的;它只包括英语(主要为美式英语)文档。

数据集结构

数据实例

Corpus
{'conclusion': 'Science is the best!',
 'id': 'd6517702-2019-04-18T12:36:24Z-00000-000',
 'argument': 'Science is aright I guess, but Physical Education (P.E) is better. Think about it, you could sit in a classroom for and hour learning about molecular reconfiguration, or you could play football with your mates. Why would you want to learn about molecular reconfiguration anyway? I think the argument here would be based on, healthy mind or healthy body. With science being the healthy mind and P.E being the healthy body. To work this one out all you got to do is ask Steven Hawkins. Only 500 words',
 'stance': 'CON'}

数据字段

[需要更多信息]

数据拆分

[需要更多信息]

数据集创建

策划理由

[需要更多信息]

源数据

初始数据收集和规范化

[需要更多信息]

源语言生产者是谁?

[需要更多信息]

注释

注释过程

[需要更多信息]

注释者是谁?

[需要更多信息]

个人和敏感信息

[需要更多信息]

使用数据的考虑事项

数据的社会影响

[需要更多信息]

偏见讨论

[需要更多信息]

其他已知限制

[需要更多信息]

其他信息

数据集策划者

[需要更多信息]

许可信息

Creative Commons Attribution 4.0 International (CC BY 4.0)

引用信息

@dataset{yamen_ajjour_2020_4139439,
  author       = {Yamen Ajjour and
                  Henning Wachsmuth and
                  Johannes Kiesel and
                  Martin Potthast and
                  Matthias Hagen and
                  Benno Stein},
  title        = {args.me corpus},
  month        = oct,
  year         = 2020,
  publisher    = {Zenodo},
  version      = {1.0-cleaned},
  doi          = {10.5281/zenodo.4139439},
  url          = {https://doi.org/10.5281/zenodo.4139439}
}