数据集:
ade_corpus_v2
ADE-Corpus-V2数据集:不良药物反应数据集。这是一个用于分类的数据集,判断句子是否涉及不良药物反应(True)或不涉及(False),以及不良药物事件和药物之间的关系抽取。DRUG-AE.rel提供了药物和不良效应之间的关系。DRUG-DOSE.rel提供了药物和剂量之间的关系。ADE-NEG.txt文件提供了所有不包含任何药物相关不良效应的ADE语料库中的句子。
情感分类,关系抽取
英语
{ 'label': 1, 'text': 'Intravenous azithromycin-induced ototoxicity.' }Config - Ade_corpus_v2_drug_ade_relation
{ 'drug': 'azithromycin', 'effect': 'ototoxicity', 'indexes': { 'drug': { 'end_char': [24], 'start_char': [12] }, 'effect': { 'end_char': [44], 'start_char': [33] } }, 'text': 'Intravenous azithromycin-induced ototoxicity.' }Config - Ade_corpus_v2_drug_dosage_relation
{ 'dosage': '4 times per day', 'drug': 'insulin', 'indexes': { 'dosage': { 'end_char': [56], 'start_char': [41] }, 'drug': { 'end_char': [40], 'start_char': [33]} }, 'text': 'She continued to receive regular insulin 4 times per day over the following 3 years with only occasional hives.' }
Train |
---|
23516 |
[需要更多信息]
[需要更多信息]
源语言制片商是谁?[需要更多信息]
[需要更多信息]
注释者是谁?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
@article{GURULINGAPPA2012885, title = "Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports", journal = "Journal of Biomedical Informatics", volume = "45", number = "5", pages = "885 - 892", year = "2012", note = "Text Mining and Natural Language Processing in Pharmacogenomics", issn = "1532-0464", doi = "https://doi.org/10.1016/j.jbi.2012.04.008", url = "http://www.sciencedirect.com/science/article/pii/S1532046412000615", author = "Harsha Gurulingappa and Abdul Mateen Rajput and Angus Roberts and Juliane Fluck and Martin Hofmann-Apitius and Luca Toldo", keywords = "Adverse drug effect, Benchmark corpus, Annotation, Harmonization, Sentence classification", abstract = "A significant amount of information about drug-related safety issues such as adverse effects are published in medical case reports that can only be explored by human readers due to their unstructured nature. The work presented here aims at generating a systematically annotated corpus that can support the development and validation of methods for the automatic extraction of drug-related adverse effects from medical case reports. The documents are systematically double annotated in various rounds to ensure consistent annotations. The annotated documents are finally harmonized to generate representative consensus annotations. In order to demonstrate an example use case scenario, the corpus was employed to train and validate models for the classification of informative against the non-informative sentences. A Maximum Entropy classifier trained with simple features and evaluated by 10-fold cross-validation resulted in the F1 score of 0.70 indicating a potential useful application of the corpus." }
感谢 @Nilanshrajput 、 @lhoestq 添加了此数据集。