英文

为哈萨克语设计的命名实体识别模型

如何使用

您可以使用Transformers流水线进行NER。

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("yeshpanovrustem/xlm-roberta-large-ner-kazakh")
model = AutoModelForTokenClassification.from_pretrained("yeshpanovrustem/xlm-roberta-large-ner-kazakh")

nlp = pipeline("ner", model = model, tokenizer = tokenizer)
example = "Қазақстан Республикасы — Шығыс Еуропа мен Орталық Азияда орналасқан мемлекет."

ner_results = nlp(example)
print(ner_results)

在验证集和测试集上的评估结果

Validation set Test set
Precision Recall F 1 -score Precision Recall F 1 -score
96.58% 96.66% 96.62% 96.49% 96.86% 96.67%

模型对验证集中的NE类的性能

NE Class Precision Recall F 1 -score Support
ADAGE 90.00% 47.37% 62.07% 19
ART 91.36% 95.48% 93.38% 155
CARDINAL 98.44% 98.37% 98.40% 2,878
CONTACT 100.00% 83.33% 90.91% 18
DATE 97.38% 97.27% 97.33% 2,603
DISEASE 96.72% 97.52% 97.12% 121
EVENT 83.24% 93.51% 88.07% 154
FACILITY 68.95% 84.83% 76.07% 178
GPE 98.46% 96.50% 97.47% 1,656
LANGUAGE 95.45% 89.36% 92.31% 47
LAW 87.50% 87.50% 87.50% 56
LOCATION 92.49% 93.81% 93.14% 210
MISCELLANEOUS 100.00% 76.92% 86.96% 26
MONEY 99.56% 100.00% 99.78% 455
NON_HUMAN 0.00% 0.00% 0.00% 1
NORP 95.71% 95.45% 95.58% 374
ORDINAL 98.14% 95.84% 96.98% 385
ORGANISATION 92.19% 90.97% 91.58% 753
PERCENTAGE 99.08% 99.08% 99.08% 437
PERSON 98.47% 98.72% 98.60% 1,175
POSITION 96.15% 97.79% 96.96% 587
PRODUCT 89.06% 78.08% 83.21% 73
PROJECT 92.13% 95.22% 93.65% 209
QUANTITY 97.58% 98.30% 97.94% 411
TIME 94.81% 96.63% 95.71% 208
micro avg 96.58% 96.66% 96.62% 13,189
macro avg 90.12% 87.51% 88.39% 13,189
weighted avg 96.67% 96.66% 96.63% 13,189

模型对测试集中的NE类的性能

NE Class Precision Recall F 1 -score Support
ADAGE 71.43% 29.41% 41.67% 17
ART 95.71% 96.89% 96.30% 161
CARDINAL 98.43% 98.60% 98.51% 2,789
CONTACT 94.44% 85.00% 89.47% 20
DATE 96.59% 97.60% 97.09% 2,584
DISEASE 87.69% 95.80% 91.57% 119
EVENT 86.67% 92.86% 89.66% 154
FACILITY 74.88% 81.73% 78.16% 197
GPE 98.57% 97.81% 98.19% 1,691
LANGUAGE 90.70% 95.12% 92.86% 41
LAW 93.33% 76.36% 84.00% 55
LOCATION 92.08% 89.42% 90.73% 208
MISCELLANEOUS 86.21% 96.15% 90.91% 26
MONEY 100.00% 100.00% 100.00% 427
NON_HUMAN 0.00% 0.00% 0.00% 1
NORP 99.46% 99.18% 99.32% 368
ORDINAL 96.63% 97.64% 97.14% 382
ORGANISATION 90.97% 91.23% 91.10% 718
PERCENTAGE 98.05% 98.05% 98.05% 462
PERSON 98.70% 99.13% 98.92% 1,151
POSITION 96.36% 97.65% 97.00% 597
PRODUCT 89.23% 77.33% 82.86% 75
PROJECT 93.69% 93.69% 93.69% 206
QUANTITY 97.26% 97.02% 97.14% 403
TIME 94.95% 94.09% 94.52% 220
micro avg 96.54% 96.85% 96.69% 13,072
macro avg 88.88% 87.11% 87.55% 13,072
weighted avg 96.55% 96.85% 96.67% 13,072