数据集:

health_fact

语言:

en

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

found

批注创建人:

expert-generated

源数据集:

original

预印本库:

arxiv:2010.09926

许可:

mit
英文

PUBHEALTH 数据集卡片

数据集摘要

PUBHEALTH 是一个全面的公共卫生声明可解释自动事实核查数据集。PUBHEALTH 数据集中的每个实例都有一个关联的真实性标签(真实、错误、未证实、混合)。此外,数据集中的每个实例还有一个解释文本字段。该解释是为声明分配特定真实性标签的证明。

支持的任务和排行榜

[需要更多信息]

语言

数据集中的文本为英文。

数据集结构

数据实例

下面是 PUBHEALTH 数据集的一个示例实例:

Field Example
claim Expired boxes of cake and pancake mix are dangerously toxic.
explanation What's True: Pancake and cake mixes that contain mold can cause life-threatening allergic reactions. What's False: Pancake and cake mixes that have passed their expiration dates are not inherently dangerous to ordinarily healthy people, and the yeast in packaged baking products does not "over time develops spores."
label mixture
author(s) David Mikkelson
date published April 19, 2006
tags food, allergies, baking, cake
main_text In April 2006, the experience of a 14-year-old who had eaten pancakes made from a mix that had gone moldy was described in the popular newspaper column Dear Abby. The account has since been circulated widely on the Internet as scores of concerned homemakers ponder the safety of the pancake and other baking mixes lurking in their larders [...]
evidence sources [1] Bennett, Allan and Kim Collins. “An Unusual Case of Anaphylaxis: Mold in Pancake Mix.” American Journal of Forensic Medicine & Pathology. September 2001 (pp. 292-295). [2] Phillips, Jeanne. “Dear Abby.” 14 April 2006 [syndicated column].

数据字段

如上述数据实例。

数据拆分

# Instances
train.tsv 9832
dev.tsv 1221
test.tsv 1235
total 12288

数据集创建

策划理由

创建此数据集是为了探索对难以验证的声明进行事实核查,即那些需要跨越新闻界领域外的专业知识的声明,本例中为生物医学和公共卫生专业知识。

还创建此数据集是为了回应缺乏为判定/标签提供黄金标准自然语言解释的事实核查数据集。

源数据

初始数据收集和规范化

该数据集来自以下事实核查、新闻评论和新闻网站:

URL Type
1231321 fact-checking
1232321 fact-checking
1233321 fact-checking
1234321 fact-checking
1235321 fact-checking
1236321 news
1237321 news
1238321 health news review
谁是源语言生产者?

[需要更多信息]

注释

注释过程

[需要更多信息]

谁是注释者?

[需要更多信息]

个人和敏感信息

据我们所知,没有个人或敏感信息,但如果有人提出我们的错误,我们将对数据集进行适当的更正。

使用数据的注意事项

数据的社会影响

[需要更多信息]

偏见讨论

[需要更多信息]

其他已知限制

[需要更多信息]

其他信息

数据集维护者

此数据集由 Neema Kotonya 和 Francesca Toni 创建,在他们的研究论文 "Explainable Automated Fact-Checking for Public Health Claims" 中提到,该论文于 EMNLP 2020 上发表。

许可信息

MIT 许可

引用信息

@inproceedings{kotonya-toni-2020-explainable,
    title = "Explainable Automated Fact-Checking for Public Health Claims",
    author = "Kotonya, Neema  and
      Toni, Francesca",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.623",
    pages = "7740--7754",
}

贡献

感谢 @bhavitvyamalik 添加此数据集。