数据集:

Paul/hatecheck

语言:

en

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

expert-generated

批注创建人:

crowdsourced

源数据集:

original

预印本库:

arxiv:2012.15606

许可:

cc-by-4.0
英文

HateCheck 数据集卡片

数据集描述

HateCheck是一个针对仇恨言论检测模型的功能测试套件。该数据集包含3,728个经过验证的测试用例,涵盖了29个功能测试。其中19个功能测试对应于不同类型的仇恨言论,另外11个功能测试涵盖了具有挑战性的非仇恨类型。这样可以有针对性地对模型性能进行诊断和分析。

在我们的ACL论文中,我们在对HateCheck进行测试的所有商业和学术仇恨言论检测模型中发现了关键弱点。有关结果和进一步讨论,请参阅论文(下方链接),以及有关数据集的进一步信息和完整的数据声明。

数据集结构

"test.csv"包含3,728个经过验证的测试用例。每个测试用例(行)具有以下属性:

功能:测试用例测试功能的缩写。

case_id:测试用例的唯一ID(分配给我们最初生成的3,901个用例之一)

test_case:测试用例的文本。

label_gold:测试用例的金标签(具有仇恨/非仇恨言论)。给定功能中的所有测试用例具有相同的金标签。

target_ident:如适用,被测试用例针对或提到的受保护群体。我们的测试套件涵盖七个受保护群体:妇女、跨性别人群、同性恋人群、黑人、残疾人群、穆斯林和移民。

direction:对于仇恨言论,这是一个二元次要标签,指示它们是否针对受保护群体中的个人还是整个群体。

focus_words:如适用,给定测试用例中的关键词或短语(例如“割他们的喉咙”)。

focus_lemma:如适用,对应的词形还原(例如“割某人的喉咙”)。

ref_case_id:对于仇恨言论,如适用,它们是通过扰动生成的更简单的仇恨案例的ID。对于非仇恨言论,如适用,对比的仇恨案例的ID。

ref_templ_id:相应的模板ID,对于仇恨言论和非仇恨言论都是如此。

templ_id:测试用例生成的模板的唯一ID(分配给我们从中生成3,901个初始用例的866个用例和模板)。

引用信息

使用HateCheck时,请引用我们的ACL论文:

@inproceedings{rottger-etal-2021-hatecheck, title = "{H}ate{C}heck: Functional Tests for Hate Speech Detection Models", author = {R{"o}ttger, Paul and Vidgen, Bertie and Nguyen, Dong and Waseem, Zeerak and Margetts, Helen and Pierrehumbert, Janet}, booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = " https://aclanthology.org/2021.acl-long.4" , doi = "10.18653/v1/2021.acl-long.4", pages = "41--58", abstract = "Detecting online hate is a difficult task that even state-of-the-art models struggle with. Typically, hate speech detection models are evaluated by measuring their performance on held-out test data using metrics such as accuracy and F1 score. However, this approach makes it difficult to identify specific model weak points. It also risks overestimating generalisable model performance due to increasingly well-evidenced systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, we introduce HateCheck, a suite of functional tests for hate speech detection models. We specify 29 model functionalities motivated by a review of previous research and a series of interviews with civil society stakeholders. We craft test cases for each functionality and validate their quality through a structured annotation process. To illustrate HateCheck{'}s utility, we test near-state-of-the-art transformer models as well as two popular commercial models, revealing critical model weaknesses.",}