多语言BERT 孟加拉语实体识别
mBERT-Bengali-NER 是一个基于Transformer的孟加拉语实体识别模型,使用了
bert-base-multilingual-uncased
个数据模型和
Wikiann
个数据集进行构建。
如何使用
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("sagorsarker/mbert-bengali-ner")
model = AutoModelForTokenClassification.from_pretrained("sagorsarker/mbert-bengali-ner")
nlp = pipeline("ner", model=model, tokenizer=tokenizer, grouped_entities=True)
example = "আমি জাহিদ এবং আমি ঢাকায় বাস করি।"
ner_results = nlp(example)
print(ner_results)
标签和ID映射
Label ID
|
Label
|
0
|
O
|
1
|
B-PER
|
2
|
I-PER
|
3
|
B-ORG
|
4
|
I-ORG
|
5
|
B-LOC
|
6
|
I-LOC
|
训练细节
评估结果
Model
|
F1
|
Precision
|
Recall
|
Accuracy
|
Loss
|
mBert-Bengali-NER
|
0.97105
|
0.96769
|
0.97443
|
0.97682
|
0.12511
|