数据集:

allenai/peer_read

任务:

文本分类

语言:

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

found

批注创建人:

expert-generated

源数据集:

original

预印本库:

arxiv:1804.09635

其他:

acceptability-classification

许可:

license:unknown

数据集介绍文件清单

英文

peer_read 数据集数据卡片

数据集摘要

PearRead 是一个可以帮助研究人员研究科学同行评审的数据集。该数据集包含超过14,000份论文草稿和对应的顶级会议（包括ACL，NIPS和ICLR）的接受/拒绝决策，以及超过10,000份由专家撰写的文本同行评审。

支持的任务和排行榜

[需要更多信息]

语言

en-English

数据集结构

数据实例

[需要更多信息]

数据字段

parsed_pdfs

name: string 数据集中的文件名
metadata: dict 论文元数据
- source: string 论文来源
- authors: list<string> 论文作者列表
- title: string 论文标题
- sections: list<dict> 包含章节标题和相应描述的列表
  - heading: string 章节标题
  - text: string 章节描述
- references: string 参考文献列表
  - title: string 参考文献标题
  - author: list<string> 参考文献作者列表
  - venue: string 参考文献来源
  - citeRegEx: string 参考文献 citeRegEx
  - shortCiteRegEx: string 参考文献 shortCiteRegEx
  - year: int 参考文献发表年份
- referenceMentions: list<string> 参考文献提及列表
  - referenceID: int 参考文献提及的ID
  - context: string 参考文献提及的上下文
  - startOffset: int 参考文献提及的起始位置
  - endOffset: int 参考文献提及的结束位置
- year: int 论文发表年份
- abstractText: string 论文摘要
- creator: string 论文创建者

reviews

id: int 评论的ID
conference: string 会议名称
comments: string 评论内容
subjects: string 评论主题
version: string 评论版本
date_of_submission: string 提交日期
title: string 论文标题
authors: list<string> 论文作者列表
accepted: bool 论文接受标志
abstract: string 论文摘要
histories: list<string> 包含链接的论文详细信息
reviews: dict 论文评论
- date: string 评论日期
- title: string 论文标题
- other_keys: string 评论者其他细节
- originality: string 创意评分
- comments: string 评论者评论
- is_meta_review: bool 评论类型标志
- recommendation: string 评论者建议
- replicability: string 可复制性评分
- presentation_format: string 展示类型
- clarity: string 清晰度评分
- meaningful_comparison: string 有意义的比较评分
- substance: string 内容评分
- reviewer_confidence: string 评论者信心评分
- soundness_correctness: string 正确性评分
- appropriateness: string 适当性评分
- impact: string 影响力评分

数据拆分

[需要更多信息]

数据集创建

策划理由

[需要更多信息]

源数据

初始数据收集和标准化

[需要更多信息]

源语言制作者是谁？

[需要更多信息]

注释

注释过程

[需要更多信息]

注释者是谁？

[需要更多信息]

个人和敏感信息

[需要更多信息]

使用数据时的注意事项

数据集的社会影响

[需要更多信息]

偏见讨论

[需要更多信息]

其他已知限制

[需要更多信息]

附加信息

数据集策划者

Dongyeop Kang，Waleed Ammar，Bhavana Dalvi Mishra，Madeleine van Zuylen，Sebastian Kohlmeier，Eduard Hovy，Roy Schwartz

许可信息

[需要更多信息]

引用信息

@inproceedings{kang18naacl，title = {A Dataset of Peer Reviews (PeerRead): Collection，Insights and NLP Applications}，author = {Dongyeop Kang and Waleed Ammar and Bhavana Dalvi and Madeleine van Zuylen and Sebastian Kohlmeier and Eduard Hovy and Roy Schwartz}，booktitle = {Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL)}，address = {New Orleans，USA}，month = {June}，url = { https://arxiv.org/abs/1804.09635} }，year = {2018}}

贡献

感谢 @vinaykudari 添加了这个数据集。

作者:

allenai

数据集大小:

31.27 KB