Turkish Named Entity Recognition (NER) Model

这个模型是使用评估过的著名的土耳其NER数据集的经调优版本（ https://github.com/stefan-it/turkish-bert/files/4558187/nerdata.txt ）对"dbmdz/bert-base-turkish-cased"模型进行的微调。

微调参数：

task = "ner"
model_checkpoint = "dbmdz/bert-base-turkish-cased"
batch_size = 8 
label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
max_length = 512 
learning_rate = 2e-5 
num_train_epochs = 3 
weight_decay = 0.01

如何使用：

model = AutoModelForTokenClassification.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
tokenizer = AutoTokenizer.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="first")
ner("your text here")

请参考" https://huggingface.co/transformers/_modules/transformers/pipelines/token_classification.html" "以了解带有聚合策略参数的实体分组。

参考测试结果：

准确率：0.9933935699477056
F1值：0.9592969472710453
精确度：0.9543530277931161
召回率：0.9642923563325274

在 "Küçük, D., Küçük, D., Arıcı, N. 2016. Türkçe Varlık İsmi Tanıma için bir Veri Kümesi ("A Named Entity Recognition Dataset for Turkish"). IEEE Sinyal İşleme, İletişim ve Uygulamaları Kurultayı. Zonguldak, Türkiye." 论文中提到的测试集上进行的评估结果。

测试集准确率精确度召回率 F1值
20010000 0.9946 0.9871 0.9463 0.9662
20020000 0.9928 0.9134 0.9206 0.9170
20030000 0.9942 0.9814 0.9186 0.9489
20040000 0.9943 0.9660 0.9522 0.9590
20050000 0.9971 0.9539 0.9932 0.9732
20060000 0.9993 0.9942 0.9942 0.9942
20070000 0.9970 0.9806 0.9439 0.9619
20080000 0.9988 0.9821 0.9649 0.9735
20090000 0.9977 0.9891 0.9479 0.9681
20100000 0.9961 0.9684 0.9293 0.9485
总体 0.9961 0.9720 0.9516 0.9617

作者:

Taner Akdeniz

数据集大小:

1.23 GB