T5基础模型适用于命名实体识别（NER, CoNLL-2003）

在这个存储库中，我们开源了一个T5基础模型，该模型在官方CoNLL-2003 NER数据集上进行了微调。

我们使用了来自Amazon的 TANL library 进行模型的微调。

具体的微调方法在Giovanni Paolini、Ben Athiwaratkun、Jason Krone、Jie Ma、Alessandro Achille、Rishita Anubhai、Cicero Nogueira dos Santos、Bing Xiang和Stefano Soatto撰写的"TANL: Structured Prediction as Translation between Augmented Natural Languages"论文中进行了介绍。

微调

我们使用了与官方实现相同的超参数设置，只有一个小改变。我们在一个V100 GPU上训练了模型，并使用梯度累积。稍微修改的配置文件（config.ini）如下：

[conll03]
datasets = conll03
model_name_or_path = t5-base
num_train_epochs = 10
max_seq_length = 256
max_seq_length_eval = 512
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
do_train = True
do_eval = True
do_predict = True
gradient_accumulation_steps = 8

在CoNLL-2003数据集的14,041个训练句子上对该模型进行微调大约花费了2个小时。

评估

在开发集上，可以获得以下评估结果：

{
"entity_precision": 0.9536446086664427,
"entity_recall": 0.9555705149781218,
"entity_f1": 0.9546065904505716,
"entity_precision_no_type": 0.9773261672824992,
"entity_recall_no_type": 0.9792998990238977,
"entity_f1_no_type": 0.9783120376597176
}

在测试集上的评估结果如下：

{
"entity_precision": 0.912182296231376,
"entity_recall": 0.9213881019830028,
"entity_f1": 0.9167620893155995,
"entity_precision_no_type": 0.953900087642419,
"entity_recall_no_type": 0.9635269121813032,
"entity_f1_no_type": 0.9586893332158901
}

总结一下：这个模型在开发集上获得了95.46%的F1得分和91.68%的测试集得分。该论文报告的F1得分为91.7%。

许可证

该模型遵循 MIT 许可协议。

致谢

感谢 Hugging Face 团队的慷慨支持，可以从他们的S3存储空间下载大小写敏感和大小写不敏感的模型?

作者:

Bayerische Staatsbibliothek

数据集大小:

851.75 MB