CoReNer

演示

我们发布了一个在线演示，以便您可以轻松地使用该模型进行操作。查看演示: http://corener-demo.aiola-lab.com . 该演示使用了 aiola/roberta-base-corener 模型。

模型描述

这是一个用于命名实体识别、关系抽取、实体提及检测和共指消解的多任务模型。

我们将命名实体识别建模为一个跨度分类任务，将关系抽取建模为(命名实体识别)跨度元组的多标签分类任务。同样，我们将实体提及检测建模为一个跨度分类任务，将共指消解建模为(实体提及检测)跨度元组的二元分类任务。为了构建共指消解聚类，我们保留每个提及的顶级先行词，然后计算提及之间的无向图的连通分量。

模型经过训练已能够识别以下内容:

实体类型: GPE, ORG, PERSON, DATE, NORP, CARDINAL, MONEY, PERCENT, WORK_OF_ART, ORDINAL, EVENT, LOC, TIME, FAC, QUANTITY, LAW, PRODUCT, LANGUAGE.
关系类型: Kill, Live_In, Located_In, OrgBased_In, Work_For.

使用示例

请查看更多细节和使用示例: https://github.com/aiola-lab/corener .

import json
from transformers import AutoTokenizer
from corener.models import Corener, ModelOutput
from corener.data import MTLDataset
from corener.utils.prediction import convert_model_output
tokenizer = AutoTokenizer.from_pretrained("aiola/roberta-large-corener")
model = Corener.from_pretrained("aiola/roberta-large-corener")
model.eval()
examples = [
    "Apple Park is the corporate headquarters of Apple Inc., located in Cupertino, California, United States. It was opened to employees in April 2017, while construction was still underway, and superseded the original headquarters at 1 Infinite Loop, which opened in 1993."
]
dataset = MTLDataset(
    types=model.config.types, 
    tokenizer=tokenizer,
    train_mode=False,
)
dataset.read_dataset(examples)
example = dataset.get_example(0)  # get first example
output: ModelOutput = model(
    input_ids=example.encodings,
    context_masks=example.context_masks,
    entity_masks=example.entity_masks,
    entity_sizes=example.entity_sizes,
    entity_spans=example.entity_spans,
    entity_sample_masks=example.entity_sample_masks,
    inference=True,
)
print(json.dumps(convert_model_output(output=output, batch=example, dataset=dataset), indent=2))

作者:

Aiola

数据集大小:

1.37 GB