数据集:
health_fact
任务:
文本分类语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
expert-generated源数据集:
original预印本库:
arxiv:2010.09926许可:
mitPUBHEALTH 是一个全面的公共卫生声明可解释自动事实核查数据集。PUBHEALTH 数据集中的每个实例都有一个关联的真实性标签(真实、错误、未证实、混合)。此外,数据集中的每个实例还有一个解释文本字段。该解释是为声明分配特定真实性标签的证明。
[需要更多信息]
数据集中的文本为英文。
下面是 PUBHEALTH 数据集的一个示例实例:
Field | Example |
---|---|
claim | Expired boxes of cake and pancake mix are dangerously toxic. |
explanation | What's True: Pancake and cake mixes that contain mold can cause life-threatening allergic reactions. What's False: Pancake and cake mixes that have passed their expiration dates are not inherently dangerous to ordinarily healthy people, and the yeast in packaged baking products does not "over time develops spores." |
label | mixture |
author(s) | David Mikkelson |
date published | April 19, 2006 |
tags | food, allergies, baking, cake |
main_text | In April 2006, the experience of a 14-year-old who had eaten pancakes made from a mix that had gone moldy was described in the popular newspaper column Dear Abby. The account has since been circulated widely on the Internet as scores of concerned homemakers ponder the safety of the pancake and other baking mixes lurking in their larders [...] |
evidence sources | [1] Bennett, Allan and Kim Collins. “An Unusual Case of Anaphylaxis: Mold in Pancake Mix.” American Journal of Forensic Medicine & Pathology. September 2001 (pp. 292-295). [2] Phillips, Jeanne. “Dear Abby.” 14 April 2006 [syndicated column]. |
如上述数据实例。
# Instances | |
---|---|
train.tsv | 9832 |
dev.tsv | 1221 |
test.tsv | 1235 |
total | 12288 |
创建此数据集是为了探索对难以验证的声明进行事实核查,即那些需要跨越新闻界领域外的专业知识的声明,本例中为生物医学和公共卫生专业知识。
还创建此数据集是为了回应缺乏为判定/标签提供黄金标准自然语言解释的事实核查数据集。
该数据集来自以下事实核查、新闻评论和新闻网站:
URL | Type |
---|---|
1231321 | fact-checking |
1232321 | fact-checking |
1233321 | fact-checking |
1234321 | fact-checking |
1235321 | fact-checking |
1236321 | news |
1237321 | news |
1238321 | health news review |
[需要更多信息]
[需要更多信息]
谁是注释者?[需要更多信息]
据我们所知,没有个人或敏感信息,但如果有人提出我们的错误,我们将对数据集进行适当的更正。
[需要更多信息]
[需要更多信息]
[需要更多信息]
此数据集由 Neema Kotonya 和 Francesca Toni 创建,在他们的研究论文 "Explainable Automated Fact-Checking for Public Health Claims" 中提到,该论文于 EMNLP 2020 上发表。
MIT 许可
@inproceedings{kotonya-toni-2020-explainable, title = "Explainable Automated Fact-Checking for Public Health Claims", author = "Kotonya, Neema and Toni, Francesca", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.emnlp-main.623", pages = "7740--7754", }
感谢 @bhavitvyamalik 添加此数据集。