xlm-roberta-ner-japanese

（日本語の固有表現抽出のモデル）

这个模型是 xlm-roberta-base （预训练跨语言 RoBERTa 模型）经过微调后用于命名实体识别（NER）标记分类。

该模型是在 Stockmark 公司提供的 NER 数据集上进行微调的，其中数据是从日本维基百科文章中收集的。有关该数据集的许可，请参阅 here 。

每个标记都被标记为：

Label id	Tag	Tag in Widget	Description
0	O	(None)	others or nothing
1	PER	PER	person
2	ORG	ORG	general corporation organization
3	ORG-P	P	political organization
4	ORG-O	O	other organization
5	LOC	LOC	location
6	INS	INS	institution, facility
7	PRD	PRD	product
8	EVT	EVT	event

预期用途

from transformers import pipeline

model_name = "tsmatz/xlm-roberta-ner-japanese"
classifier = pipeline("token-classification", model=model_name)
result = classifier("鈴木は4月の陽気の良い日に、鈴をつけて熊本県の阿蘇山に登った")
print(result)

训练过程

您可以从 here 下载用于微调的源代码。

训练超参数

训练过程中使用了以下超参数：

学习率：5e-05
训练批次大小：12
评估批次大小：12
种子：42
优化器：Adam，beta=(0.9,0.999)，epsilon=1e-08
lr_scheduler_type：linear
训练轮数：5

训练结果

Training Loss	Epoch	Step	Validation Loss	F1
No log	1.0	446	0.1510	0.8457
No log	2.0	892	0.0626	0.9261
No log	3.0	1338	0.0366	0.9580
No log	4.0	1784	0.0196	0.9792
No log	5.0	2230	0.0173	0.9864

框架版本

Transformers 4.23.1
Pytorch 1.12.1+cu102
Datasets 2.6.1
Tokenizers 0.13.1

作者:

Tsuyoshi Matsuzaki

数据集大小:

1.05 GB