模型:
savasy/bert-base-turkish-ner-cased
** Türkçe için kolay bir python NER (Bert + Transfer Learning) (İsim Varlık Tanıma) modeli...
感谢@stefan-it,我应用了以下的训练方式
cd tr-data
for file in train.txt dev.txt test.txt labels.txtdo wget https://schweter.eu/storage/turkish-bert-wikiann/$file done
cd ..它将下载包含训练、验证和测试数据集的预处理数据,并将它们放在tr-data文件夹中。
运行预训练下载数据集后,可以开始预训练。只需设置以下环境变量:
export MAX_LENGTH=128 export BERT_MODEL=dbmdz/bert-base-turkish-cased export OUTPUT_DIR=tr-new-model export BATCH_SIZE=32 export NUM_EPOCHS=3 export SAVE_STEPS=625 export SEED=1
然后运行预训练:
python3 run_ner_old.py --data_dir ./tr-data3 \ --model_type bert \ --labels ./tr-data/labels.txt \ --model_name_or_path $BERT_MODEL \ --output_dir $OUTPUT_DIR-$SEED \ --max_seq_length $MAX_LENGTH \ --num_train_epochs $NUM_EPOCHS \ --per_gpu_train_batch_size $BATCH_SIZE \ --save_steps $SAVE_STEPS \ --seed $SEED \ --do_train \ --do_eval \ --do_predict \ --fp16
from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer model = AutoModelForTokenClassification.from_pretrained("savasy/bert-base-turkish-ner-cased") tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-ner-cased") ner=pipeline('ner', model=model, tokenizer=tokenizer) ner("Mustafa Kemal Atatürk 19 Mayıs 1919'da Samsun'a ayak bastı.")
数据1:对于上面的数据评估结果:
测试结果:
数据2:@kemalaraz提供的数据的性能如下所示
savas@savas-lenova:~/Desktop/trans/tr-new-model-1$ cat eval_results.txt
savas@savas-lenova:~/Desktop/trans/tr-new-model-1$ cat test_results.txt