模型:

akdeniz27/bert-base-turkish-cased-ner

英文

Turkish Named Entity Recognition (NER) Model

这个模型是使用评估过的著名的土耳其NER数据集的经调优版本( https://github.com/stefan-it/turkish-bert/files/4558187/nerdata.txt )对"dbmdz/bert-base-turkish-cased"模型进行的微调。

微调参数:

task = "ner"
model_checkpoint = "dbmdz/bert-base-turkish-cased"
batch_size = 8 
label_list = ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']
max_length = 512 
learning_rate = 2e-5 
num_train_epochs = 3 
weight_decay = 0.01 

如何使用:

model = AutoModelForTokenClassification.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
tokenizer = AutoTokenizer.from_pretrained("akdeniz27/bert-base-turkish-cased-ner")
ner = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="first")
ner("your text here")

请参考" https://huggingface.co/transformers/_modules/transformers/pipelines/token_classification.html" "以了解带有聚合策略参数的实体分组。

参考测试结果:

  • 准确率:0.9933935699477056
  • F1值:0.9592969472710453
  • 精确度:0.9543530277931161
  • 召回率:0.9642923563325274

"Küçük, D., Küçük, D., Arıcı, N. 2016. Türkçe Varlık İsmi Tanıma için bir Veri Kümesi ("A Named Entity Recognition Dataset for Turkish"). IEEE Sinyal İşleme, İletişim ve Uygulamaları Kurultayı. Zonguldak, Türkiye." 论文中提到的测试集上进行的评估结果。

  • 测试集 准确率 精确度 召回率 F1值
  • 20010000 0.9946 0.9871 0.9463 0.9662
  • 20020000 0.9928 0.9134 0.9206 0.9170
  • 20030000 0.9942 0.9814 0.9186 0.9489
  • 20040000 0.9943 0.9660 0.9522 0.9590
  • 20050000 0.9971 0.9539 0.9932 0.9732
  • 20060000 0.9993 0.9942 0.9942 0.9942
  • 20070000 0.9970 0.9806 0.9439 0.9619
  • 20080000 0.9988 0.9821 0.9649 0.9735
  • 20090000 0.9977 0.9891 0.9479 0.9681
  • 20100000 0.9961 0.9684 0.9293 0.9485
  • 总体 0.9961 0.9720 0.9516 0.9617