数据集:

ade_corpus_v2

任务:

文本分类

标记分类

子任务:

coreference-resolution fact-checking

语言:

计算机处理:

monolingual

大小:

10K<n<100K 1K<n<10K n<1K

语言创建人:

found

批注创建人:

expert-generated

源数据集:

original

许可:

license:unknown

数据集介绍文件清单

英文

Adverse Drug Reaction Data v2数据集卡片

数据集概述

ADE-Corpus-V2数据集：不良药物反应数据集。这是一个用于分类的数据集，判断句子是否涉及不良药物反应（True）或不涉及（False），以及不良药物事件和药物之间的关系抽取。DRUG-AE.rel提供了药物和不良效应之间的关系。DRUG-DOSE.rel提供了药物和剂量之间的关系。ADE-NEG.txt文件提供了所有不包含任何药物相关不良效应的ADE语料库中的句子。

支持的任务和排行榜

情感分类，关系抽取

语言

英语

数据集结构

数据实例

Config - Ade_corpus_v2_classification

{
      'label': 1, 
      'text': 'Intravenous azithromycin-induced ototoxicity.'
}

Config - Ade_corpus_v2_drug_ade_relation

{ 
    'drug': 'azithromycin', 
    'effect': 'ototoxicity', 
    'indexes': {
                  'drug': {
                            'end_char': [24], 
                            'start_char': [12]
                          }, 
                  'effect': {
                            'end_char': [44], 
                            'start_char': [33]
                            }
                }, 
    'text': 'Intravenous azithromycin-induced ototoxicity.'
    
}

Config - Ade_corpus_v2_drug_dosage_relation

{
    'dosage': '4 times per day', 
    'drug': 'insulin', 
    'indexes': {
                'dosage': {
                            'end_char': [56], 
                            'start_char': [41]
                        }, 
                'drug': {
                          'end_char': [40], 
                          'start_char': [33]}
                        }, 
    'text': 'She continued to receive regular insulin 4 times per day over the following 3 years with only occasional hives.'
}

数据字段

Config - Ade_corpus_v2_classification

text - 输入文本。
label - 是否涉及不良药物反应（1）或不涉及（0）。

Config - Ade_corpus_v2_drug_ade_relation

text - 输入文本。
drug - 药物名称。
effect - 药物引起的效应。
indexes.drug.start_char - 药物字符串在文本中的起始索引。
indexes.drug.end_char - 药物字符串在文本中的结束索引。
indexes.effect.start_char - 效应字符串在文本中的起始索引。
indexes.effect.end_char - 效应字符串在文本中的结束索引。

Config - Ade_corpus_v2_drug_dosage_relation

text - 输入文本。
drug - 药物名称。
dosage - 药物剂量。
indexes.drug.start_char - 药物字符串在文本中的起始索引。
indexes.drug.end_char - 药物字符串在文本中的结束索引。
indexes.dosage.start_char - 剂量字符串在文本中的起始索引。
indexes.dosage.end_char - 剂量字符串在文本中的结束索引。

数据拆分

Train
23516

数据集创建

策划理由

[需要更多信息]

源数据

初始数据收集和规范化

[需要更多信息]

源语言制片商是谁？

[需要更多信息]

注释

注释过程

[需要更多信息]

注释者是谁？

[需要更多信息]

个人和敏感信息

[需要更多信息]

使用数据的考虑事项

数据的社会影响

[需要更多信息]

偏见讨论

[需要更多信息]

其他已知限制

[需要更多信息]

其他信息

数据集策划人

[需要更多信息]

许可信息

[需要更多信息]

引用信息

@article{GURULINGAPPA2012885,
title = "Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports",
journal = "Journal of Biomedical Informatics",
volume = "45",
number = "5",
pages = "885 - 892",
year = "2012",
note = "Text Mining and Natural Language Processing in Pharmacogenomics",
issn = "1532-0464",
doi = "https://doi.org/10.1016/j.jbi.2012.04.008",
url = "http://www.sciencedirect.com/science/article/pii/S1532046412000615",
author = "Harsha Gurulingappa and Abdul Mateen Rajput and Angus Roberts and Juliane Fluck and Martin Hofmann-Apitius and Luca Toldo",
keywords = "Adverse drug effect, Benchmark corpus, Annotation, Harmonization, Sentence classification",
abstract = "A significant amount of information about drug-related safety issues such as adverse effects are published in medical case reports that can only be explored by human readers due to their unstructured nature. The work presented here aims at generating a systematically annotated corpus that can support the development and validation of methods for the automatic extraction of drug-related adverse effects from medical case reports. The documents are systematically double annotated in various rounds to ensure consistent annotations. The annotated documents are finally harmonized to generate representative consensus annotations. In order to demonstrate an example use case scenario, the corpus was employed to train and validate models for the classification of informative against the non-informative sentences. A Maximum Entropy classifier trained with simple features and evaluated by 10-fold cross-validation resulted in the F1 score of 0.70 indicating a potential useful application of the corpus."
}

贡献者

感谢 @Nilanshrajput 、 @lhoestq 添加了此数据集。

作者:

佚名

数据集大小:

33.81 KB