数据集:

DFKI-SLT/brat

子任务:

parsing

语言创建人:

found

批注创建人:

expert-generated
英文

Brat信息卡

描述

摘要

Brat是一种直观的基于网络的文本注释工具,支持自然语言处理(NLP)技术。BRAT已经针对各种NLP任务开发了丰富的结构化注释,并旨在通过NLP技术支持手动策划工作并提高注释人员的生产力。 Brat特别设计用于结构化注释,其中注释不是自由形式的文本,而是具有可以由计算机自动处理和解释的固定形式。

数据集结构

使用brat格式注释的数据集使用此脚本进行处理。使用brat创建的注释以独立格式存储在磁盘上:注释与注释的文档文本分开存储,工具永远不会修改文本。对于系统中的每个文本文档,都有一个相应的注释文件。两者通过文件命名约定相关联,即它们的基本名称(没有后缀的文件名)相同:例如,文件DOC-1000.ann包含文件DOC-1000.txt的注释。可以在 here 中找到更多信息。

数据实例

[需要更多信息]

数据字段

-context: html content of data file as string
-file_name: a string name of file
-spans: a sequence containing id, type, location and text of a span
-relations: a sequence containing id, type and arguments of a relation
-equivalence_relations: 
-events:
-attributions:
-normalizations:
-notes:

用法

可以通过调用 load_dataset() 方法并传递 kwargs (参数为 BuilderConfig )来使用brat脚本,其中至少应包括使用brat准备的数据集的url。以下是我们提供的 SciArg 数据集的示例,

from datasets import load_dataset
kwargs = {
"description" :
  """This dataset is an extension of the Dr. Inventor corpus (Fisas et al., 2015, 2016) with an annotation layer containing
  fine-grained argumentative components and relations. It is the first argument-annotated corpus of scientific
  publications (in English), which allows for joint analyses of argumentation and other rhetorical dimensions of
  scientific writing.""",
"citation" :
  """@inproceedings{lauscher2018b,
    title = {An argument-annotated corpus of scientific publications},
    booktitle = {Proceedings of the 5th Workshop on Mining Argumentation},
    publisher = {Association for Computational Linguistics},
    author = {Lauscher, Anne and Glava\v{s}, Goran and Ponzetto, Simone Paolo},
    address = {Brussels, Belgium},
    year = {2018},
    pages = {40–46}
  }""",
"homepage": "https://github.com/anlausch/ArguminSci",
"url": "http://data.dws.informatik.uni-mannheim.de/sci-arg/compiled_corpus.zip",
"file_name_blacklist": ['A28'],
}

dataset = load_dataset('dfki-nlp/brat', **kwargs)

附加信息

许可信息

[需要更多信息]

引用信息

@inproceedings{stenetorp-etal-2012-brat,
    title = "brat: a Web-based Tool for {NLP}-Assisted Text Annotation",
    author = "Stenetorp, Pontus  and
      Pyysalo, Sampo  and
      Topi{\'c}, Goran  and
      Ohta, Tomoko  and
      Ananiadou, Sophia  and
      Tsujii, Jun{'}ichi",
    booktitle = "Proceedings of the Demonstrations at the 13th Conference of the {E}uropean Chapter of the Association for Computational Linguistics",
    month = apr,
    year = "2012",
    address = "Avignon, France",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/E12-2021",
    pages = "102--107",
}