数据集:
id_clickbait
任务:
文本分类子任务:
fact-checking语言:
id计算机处理:
monolingual大小:
10K<n<100K语言创建人:
expert-generated批注创建人:
expert-generated源数据集:
original许可:
cc-by-4.0CLICK-ID 数据集是从12个本地在线新闻发布商(detikNews、Fimela、Kapanlagi、Kompas、Liputan6、Okezone、Posmetro-Medan、Republika、Sindonews、Tempo、Tribunnews 和 Wowkeren)收集的印度尼西亚新闻标题的集合。该数据集主要由两部分组成:(i)46,119个原始文章数据和(ii)15,000个带有点击诱导注释的标题示例。注释是通过3个注释人员对每个标题进行检查而进行的。判断仅基于标题。其中,大多数被认为是真实情况。在注释样本中,我们的注释显示有6,290个点击诱导标题和8,710个非点击诱导标题。
[需要更多信息]
印度尼西亚语
注释文章的示例:
{ 'id': '100', 'label': 1, 'title': "SAH! Ini Daftar Nama Menteri Kabinet Jokowi - Ma'ruf Amin" } >
数据集包含训练集。
[需要更多信息]
[需要更多信息]
谁是源语言的制作者?[需要更多信息]
[需要更多信息]
注释者是谁?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
知识共享署名4.0国际许可证
@article{WILLIAM2020106231, title = "CLICK-ID: A novel dataset for Indonesian clickbait headlines", journal = "Data in Brief", volume = "32", pages = "106231", year = "2020", issn = "2352-3409", doi = "https://doi.org/10.1016/j.dib.2020.106231", url = "http://www.sciencedirect.com/science/article/pii/S2352340920311252", author = "Andika William and Yunita Sari", keywords = "Indonesian, Natural Language Processing, News articles, Clickbait, Text-classification", abstract = "News analysis is a popular task in Natural Language Processing (NLP). In particular, the problem of clickbait in news analysis has gained attention in recent years [1, 2]. However, the majority of the tasks has been focused on English news, in which there is already a rich representative resource. For other languages, such as Indonesian, there is still a lack of resource for clickbait tasks. Therefore, we introduce the CLICK-ID dataset of Indonesian news headlines extracted from 12 Indonesian online news publishers. It is comprised of 15,000 annotated headlines with clickbait and non-clickbait labels. Using the CLICK-ID dataset, we then developed an Indonesian clickbait classification model achieving favourable performance. We believe that this corpus will be useful for replicable experiments in clickbait detection or other experiments in NLP areas." }
感谢 @cahya-wirawan 添加了这个数据集。