模型:

m3hrdadfi/icelandic-ner-bert

英文

IcelandicNER BERT

This model was fine-tuned on the MIM-GOLD-NER dataset for the Icelandic language. The MIM-GOLD-NER corpus was developed at Reykjavik University in 2018–2020 that covered eight types of entities:

  • Date
  • Location
  • Miscellaneous
  • Money
  • Organization
  • Percent
  • Person
  • Time

Dataset Information

Records B-Date B-Location B-Miscellaneous B-Money B-Organization B-Percent B-Person B-Time I-Date I-Location I-Miscellaneous I-Money I-Organization I-Percent I-Person I-Time
Train 39988 3409 5980 4351 729 5754 502 11719 868 2112 516 3036 770 2382 50 5478 790
Valid 7063 570 1034 787 100 1078 103 2106 147 409 76 560 104 458 7 998 136
Test 8299 779 1319 935 153 1315 108 2247 172 483 104 660 167 617 10 1089 158

Evaluation

The following tables summarize the scores obtained by model overall and per each class.

entity precision recall f1-score support
Date 0.969466 0.978177 0.973802 779.0
Location 0.955201 0.953753 0.954476 1319.0
Miscellaneous 0.867033 0.843850 0.855285 935.0
Money 0.979730 0.947712 0.963455 153.0
Organization 0.893939 0.897338 0.895636 1315.0
Percent 1.000000 1.000000 1.000000 108.0
Person 0.963028 0.973743 0.968356 2247.0
Time 0.976879 0.982558 0.979710 172.0
micro avg 0.938158 0.938958 0.938558 7028.0
macro avg 0.950659 0.947141 0.948840 7028.0
weighted avg 0.937845 0.938958 0.938363 7028.0

How To Use

You use this model with Transformers pipeline for NER.

Installing requirements

pip install transformers

How to predict using pipeline

from transformers import AutoTokenizer
from transformers import AutoModelForTokenClassification  # for pytorch
from transformers import TFAutoModelForTokenClassification  # for tensorflow
from transformers import pipeline


model_name_or_path = "m3hrdadfi/icelandic-ner-bert" 
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Pytorch
# model = TFAutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Tensorflow

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Kristin manneskja getur ekki lagt frásagnir af Jesú Kristi á hilluna vegna þess að hún sé búin að lesa þær ."

ner_results = nlp(example)
print(ner_results)

Questions?

Post a Github issue on the IcelandicNER Issues repo.

冰岛语NER BERT

该模型在冰岛语的MIM-GOLD-NER数据集上进行了微调。该 MIM-GOLD-NER 语料库是在 Reykjavik University 于2018-2020年开发的,涵盖了八种实体类型:

  • 日期
  • 地点
  • 杂项
  • 货币
  • 组织
  • 百分比
  • 人物
  • 时间

数据集信息

Records B-Date B-Location B-Miscellaneous B-Money B-Organization B-Percent B-Person B-Time I-Date I-Location I-Miscellaneous I-Money I-Organization I-Percent I-Person I-Time
Train 39988 3409 5980 4351 729 5754 502 11719 868 2112 516 3036 770 2382 50 5478 790
Valid 7063 570 1034 787 100 1078 103 2106 147 409 76 560 104 458 7 998 136
Test 8299 779 1319 935 153 1315 108 2247 172 483 104 660 167 617 10 1089 158

评估

以下表格总结了模型整体和每个类别的得分。

entity precision recall f1-score support
Date 0.969466 0.978177 0.973802 779.0
Location 0.955201 0.953753 0.954476 1319.0
Miscellaneous 0.867033 0.843850 0.855285 935.0
Money 0.979730 0.947712 0.963455 153.0
Organization 0.893939 0.897338 0.895636 1315.0
Percent 1.000000 1.000000 1.000000 108.0
Person 0.963028 0.973743 0.968356 2247.0
Time 0.976879 0.982558 0.979710 172.0
micro avg 0.938158 0.938958 0.938558 7028.0
macro avg 0.950659 0.947141 0.948840 7028.0
weighted avg 0.937845 0.938958 0.938363 7028.0

如何使用

您可以使用Transformers NER管道来使用此模型。

安装要求

pip install transformers

如何使用管道进行预测

from transformers import AutoTokenizer
from transformers import AutoModelForTokenClassification  # for pytorch
from transformers import TFAutoModelForTokenClassification  # for tensorflow
from transformers import pipeline


model_name_or_path = "m3hrdadfi/icelandic-ner-bert" 
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Pytorch
# model = TFAutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Tensorflow

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Kristin manneskja getur ekki lagt frásagnir af Jesú Kristi á hilluna vegna þess að hún sé búin að lesa þær ."

ner_results = nlp(example)
print(ner_results)

有问题吗?

IcelandicNER Issues 仓库上发布Github问题。