模型:
m3hrdadfi/icelandic-ner-roberta
这个模型是在冰岛语的MIM-GOLD-NER数据集上进行微调的。该 MIM-GOLD-NER 语料库在2018-2020年由 Reykjavik University 开发,涵盖了八种实体类型:
Records | B-Date | B-Location | B-Miscellaneous | B-Money | B-Organization | B-Percent | B-Person | B-Time | I-Date | I-Location | I-Miscellaneous | I-Money | I-Organization | I-Percent | I-Person | I-Time | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Train | 39988 | 3409 | 5980 | 4351 | 729 | 5754 | 502 | 11719 | 868 | 2112 | 516 | 3036 | 770 | 2382 | 50 | 5478 | 790 |
Valid | 7063 | 570 | 1034 | 787 | 100 | 1078 | 103 | 2106 | 147 | 409 | 76 | 560 | 104 | 458 | 7 | 998 | 136 |
Test | 8299 | 779 | 1319 | 935 | 153 | 1315 | 108 | 2247 | 172 | 483 | 104 | 660 | 167 | 617 | 10 | 1089 | 158 |
以下表格总结了模型整体以及每个类别获得的得分。
entity | precision | recall | f1-score | support |
---|---|---|---|---|
Date | 0.961881 | 0.971759 | 0.966794 | 779.0 |
Location | 0.963047 | 0.968158 | 0.965595 | 1319.0 |
Miscellaneous | 0.884946 | 0.880214 | 0.882574 | 935.0 |
Money | 0.980132 | 0.967320 | 0.973684 | 153.0 |
Organization | 0.924300 | 0.928517 | 0.926404 | 1315.0 |
Percent | 1.000000 | 1.000000 | 1.000000 | 108.0 |
Person | 0.978591 | 0.976413 | 0.977501 | 2247.0 |
Time | 0.965116 | 0.965116 | 0.965116 | 172.0 |
micro avg | 0.951258 | 0.952476 | 0.951866 | 7028.0 |
macro avg | 0.957252 | 0.957187 | 0.957209 | 7028.0 |
weighted avg | 0.951237 | 0.952476 | 0.951849 | 7028.0 |
您可以使用Transformers NER管道来使用此模型。
pip install transformers
from transformers import AutoTokenizer from transformers import AutoModelForTokenClassification # for pytorch from transformers import TFAutoModelForTokenClassification # for tensorflow from transformers import pipeline model_name_or_path = "m3hrdadfi/icelandic-ner-roberta" tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) model = AutoModelForTokenClassification.from_pretrained(model_name_or_path) # Pytorch # model = TFAutoModelForTokenClassification.from_pretrained(model_name_or_path) # Tensorflow nlp = pipeline("ner", model=model, tokenizer=tokenizer) example = "Kristin manneskja getur ekki lagt frásagnir af Jesú Kristi á hilluna vegna þess að hún sé búin að lesa þær ." ner_results = nlp(example) print(ner_results)
在 IcelandicNER Issues 存储库上发布Github问题。