数据集:
DFKI-SLT/brat
Brat是一种直观的基于网络的文本注释工具,支持自然语言处理(NLP)技术。BRAT已经针对各种NLP任务开发了丰富的结构化注释,并旨在通过NLP技术支持手动策划工作并提高注释人员的生产力。 Brat特别设计用于结构化注释,其中注释不是自由形式的文本,而是具有可以由计算机自动处理和解释的固定形式。
使用brat格式注释的数据集使用此脚本进行处理。使用brat创建的注释以独立格式存储在磁盘上:注释与注释的文档文本分开存储,工具永远不会修改文本。对于系统中的每个文本文档,都有一个相应的注释文件。两者通过文件命名约定相关联,即它们的基本名称(没有后缀的文件名)相同:例如,文件DOC-1000.ann包含文件DOC-1000.txt的注释。可以在 here 中找到更多信息。
[需要更多信息]
-context: html content of data file as string -file_name: a string name of file -spans: a sequence containing id, type, location and text of a span -relations: a sequence containing id, type and arguments of a relation -equivalence_relations: -events: -attributions: -normalizations: -notes:
可以通过调用 load_dataset() 方法并传递 kwargs (参数为 BuilderConfig )来使用brat脚本,其中至少应包括使用brat准备的数据集的url。以下是我们提供的 SciArg 数据集的示例,
from datasets import load_dataset kwargs = { "description" : """This dataset is an extension of the Dr. Inventor corpus (Fisas et al., 2015, 2016) with an annotation layer containing fine-grained argumentative components and relations. It is the first argument-annotated corpus of scientific publications (in English), which allows for joint analyses of argumentation and other rhetorical dimensions of scientific writing.""", "citation" : """@inproceedings{lauscher2018b, title = {An argument-annotated corpus of scientific publications}, booktitle = {Proceedings of the 5th Workshop on Mining Argumentation}, publisher = {Association for Computational Linguistics}, author = {Lauscher, Anne and Glava\v{s}, Goran and Ponzetto, Simone Paolo}, address = {Brussels, Belgium}, year = {2018}, pages = {40–46} }""", "homepage": "https://github.com/anlausch/ArguminSci", "url": "http://data.dws.informatik.uni-mannheim.de/sci-arg/compiled_corpus.zip", "file_name_blacklist": ['A28'], } dataset = load_dataset('dfki-nlp/brat', **kwargs)
[需要更多信息]
@inproceedings{stenetorp-etal-2012-brat, title = "brat: a Web-based Tool for {NLP}-Assisted Text Annotation", author = "Stenetorp, Pontus and Pyysalo, Sampo and Topi{\'c}, Goran and Ohta, Tomoko and Ananiadou, Sophia and Tsujii, Jun{'}ichi", booktitle = "Proceedings of the Demonstrations at the 13th Conference of the {E}uropean Chapter of the Association for Computational Linguistics", month = apr, year = "2012", address = "Avignon, France", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/E12-2021", pages = "102--107", }