模型:

m3hrdadfi/icelandic-ner-bert

任务:

标记分类

类库:

PyTorch TensorFlow Transformers

语言:

其他:

bert AutoTrain Compatible

许可:

apache-2.0

模型介绍文件清单

英文

IcelandicNER BERT

This model was fine-tuned on the MIM-GOLD-NER dataset for the Icelandic language. The MIM-GOLD-NER corpus was developed at Reykjavik University in 2018–2020 that covered eight types of entities:

Date
Location
Miscellaneous
Money
Organization
Percent
Person
Time

Dataset Information

Records	B-Date	B-Location	B-Miscellaneous	B-Money	B-Organization	B-Percent	B-Person	B-Time	I-Date	I-Location	I-Miscellaneous	I-Money	I-Organization	I-Percent	I-Person	I-Time
Train	39988	3409	5980	4351	729	5754	502	11719	868	2112	516	3036	770	2382	50	5478	790
Valid	7063	570	1034	787	100	1078	103	2106	147	409	76	560	104	458	7	998	136
Test	8299	779	1319	935	153	1315	108	2247	172	483	104	660	167	617	10	1089	158

Evaluation

The following tables summarize the scores obtained by model overall and per each class.

entity	precision	recall	f1-score	support
Date	0.969466	0.978177	0.973802	779.0
Location	0.955201	0.953753	0.954476	1319.0
Miscellaneous	0.867033	0.843850	0.855285	935.0
Money	0.979730	0.947712	0.963455	153.0
Organization	0.893939	0.897338	0.895636	1315.0
Percent	1.000000	1.000000	1.000000	108.0
Person	0.963028	0.973743	0.968356	2247.0
Time	0.976879	0.982558	0.979710	172.0
micro avg	0.938158	0.938958	0.938558	7028.0
macro avg	0.950659	0.947141	0.948840	7028.0
weighted avg	0.937845	0.938958	0.938363	7028.0

How To Use

You use this model with Transformers pipeline for NER.

Installing requirements

pip install transformers

How to predict using pipeline

from transformers import AutoTokenizer
from transformers import AutoModelForTokenClassification  # for pytorch
from transformers import TFAutoModelForTokenClassification  # for tensorflow
from transformers import pipeline


model_name_or_path = "m3hrdadfi/icelandic-ner-bert" 
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Pytorch
# model = TFAutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Tensorflow

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Kristin manneskja getur ekki lagt frásagnir af Jesú Kristi á hilluna vegna þess að hún sé búin að lesa þær ."

ner_results = nlp(example)
print(ner_results)

Questions?

Post a Github issue on the IcelandicNER Issues repo.

冰岛语NER BERT

该模型在冰岛语的MIM-GOLD-NER数据集上进行了微调。该 MIM-GOLD-NER 语料库是在 Reykjavik University 于2018-2020年开发的，涵盖了八种实体类型：

日期
地点
杂项
货币
组织
百分比
人物
时间

数据集信息

Records	B-Date	B-Location	B-Miscellaneous	B-Money	B-Organization	B-Percent	B-Person	B-Time	I-Date	I-Location	I-Miscellaneous	I-Money	I-Organization	I-Percent	I-Person	I-Time
Train	39988	3409	5980	4351	729	5754	502	11719	868	2112	516	3036	770	2382	50	5478	790
Valid	7063	570	1034	787	100	1078	103	2106	147	409	76	560	104	458	7	998	136
Test	8299	779	1319	935	153	1315	108	2247	172	483	104	660	167	617	10	1089	158

评估

以下表格总结了模型整体和每个类别的得分。

entity	precision	recall	f1-score	support
Date	0.969466	0.978177	0.973802	779.0
Location	0.955201	0.953753	0.954476	1319.0
Miscellaneous	0.867033	0.843850	0.855285	935.0
Money	0.979730	0.947712	0.963455	153.0
Organization	0.893939	0.897338	0.895636	1315.0
Percent	1.000000	1.000000	1.000000	108.0
Person	0.963028	0.973743	0.968356	2247.0
Time	0.976879	0.982558	0.979710	172.0
micro avg	0.938158	0.938958	0.938558	7028.0
macro avg	0.950659	0.947141	0.948840	7028.0
weighted avg	0.937845	0.938958	0.938363	7028.0

如何使用

您可以使用Transformers NER管道来使用此模型。

安装要求

pip install transformers

如何使用管道进行预测

from transformers import AutoTokenizer
from transformers import AutoModelForTokenClassification  # for pytorch
from transformers import TFAutoModelForTokenClassification  # for tensorflow
from transformers import pipeline


model_name_or_path = "m3hrdadfi/icelandic-ner-bert" 
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Pytorch
# model = TFAutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Tensorflow

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Kristin manneskja getur ekki lagt frásagnir af Jesú Kristi á hilluna vegna þess að hún sé búin að lesa þær ."

ner_results = nlp(example)
print(ner_results)

有问题吗？

在 IcelandicNER Issues 仓库上发布Github问题。

作者:

Mehrdad Farahani

数据集大小:

1.32 GB