英文

CoReNer

演示

我们发布了一个在线演示,以便您可以轻松地与模型进行交互。请查看: http://corener-demo.aiola-lab.com 。该演示使用 aiola/roberta-base-corener 模型。

模型描述

多任务模型用于命名实体识别,关系抽取,实体提及检测和共指消解。

我们将NER建模为跨度分类任务,将关系抽取建模为(NER)跨度元组的多标签分类。类似地,我们将EMD模型化为跨度分类任务,并将CR建模为(EMD)跨度元组的二元分类。为了构建CR簇,我们保留每个提及的顶级指代,然后计算提及的无向图的连通分量。

该模型经过训练,可以识别以下内容:

  • 实体类型:GPE,ORG,PERSON,DATE,NORP,CARDINAL,MONEY,PERCENT,WORK_OF_ART,ORDINAL,EVENT,LOC,TIME,FAC,QUANTITY,LAW,PRODUCT,LANGUAGE。
  • 关系类型:Kill,Live_In,Located_In,OrgBased_In,Work_For。

使用示例

有关详细信息和使用示例,请参阅: https://github.com/aiola-lab/corener

import json

from transformers import AutoTokenizer
from corener.models import Corener, ModelOutput
from corener.data import MTLDataset
from corener.utils.prediction import convert_model_output


tokenizer = AutoTokenizer.from_pretrained("aiola/roberta-base-corener")
model = Corener.from_pretrained("aiola/roberta-base-corener")
model.eval()

examples = [
    "Apple Park is the corporate headquarters of Apple Inc., located in Cupertino, California, United States. It was opened to employees in April 2017, while construction was still underway, and superseded the original headquarters at 1 Infinite Loop, which opened in 1993."
]

dataset = MTLDataset(
    types=model.config.types, 
    tokenizer=tokenizer,
    train_mode=False,
)
dataset.read_dataset(examples)
example = dataset.get_example(0)  # get first example

output: ModelOutput = model(
    input_ids=example.encodings,
    context_masks=example.context_masks,
    entity_masks=example.entity_masks,
    entity_sizes=example.entity_sizes,
    entity_spans=example.entity_spans,
    entity_sample_masks=example.entity_sample_masks,
    inference=True,
)

print(json.dumps(convert_model_output(output=output, batch=example, dataset=dataset), indent=2))