经过微调的印尼情感分类器

该模型是在 IndoNLU's SmSA 数据集上对 indobenchmark/indobert-base-p1 模型进行微调得到的。在评估数据集上取得了以下结果：

损失：0.3233
准确率：0.9317
F1值：0.9034

在测试数据集上的结果如下：

准确率：0.928
F1值（宏平均）：0.9113470780757361
F1值（微平均）：0.928
F1值（加权平均）：0.9261959965604815

模型描述

该模型可以用于确定文本的情感，有三种可能的输出结果[积极、消极或中性]

如何使用

from transformers import AutoTokenizer, AutoModelForSequenceClassification

Pre-trained = "hanifnoerr/Fine-tuned-Indonesian-Sentiment-Classifier"
tokenizer = AutoTokenizer.from_pretrained(Pre-trained)
model = AutoModelForSequenceClassification.from_pretrained(Pre-trained)

进行分类

pretrained_name = "hanifnoerr/Fine-tuned-Indonesian-Sentiment-Classifier"
sentimen = pipeline(tokenizer=pretrained_name, model=pretrained_name)

kalimat = "buku ini jelek sekali"
sentimen(kalimat)

输出：[{'label': 'negative', 'score': 0.9996247291564941}]

训练结果

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1
0.08	1.0	688	0.3532	0.9310	0.9053
0.0523	2.0	1376	0.3233	0.9317	0.9034
0.045	3.0	2064	0.3949	0.9286	0.8995
0.0252	4.0	2752	0.4662	0.9310	0.9049
0.0149	5.0	3440	0.6251	0.9246	0.8899
0.0091	6.0	4128	0.6148	0.9254	0.8928
0.0111	7.0	4816	0.6259	0.9222	0.8902
0.0106	8.0	5504	0.6123	0.9238	0.8882
0.0092	9.0	6192	0.6353	0.9230	0.8928
0.0085	10.0	6880	0.6733	0.9254	0.8989
0.0062	11.0	7568	0.6666	0.9302	0.9027
0.0036	12.0	8256	0.7578	0.9230	0.8962
0.0055	13.0	8944	0.7378	0.9270	0.8947
0.0023	14.0	9632	0.7758	0.9230	0.8978
0.0009	15.0	10320	0.7051	0.9278	0.9006
0.0033	16.0	11008	0.7442	0.9214	0.8902
0.0	17.0	11696	0.7513	0.9254	0.8974
0.0	18.0	12384	0.7554	0.9270	0.8999

尽管经过18个epochs的训练，该模型使用的是最佳权重（Epoch 2）

框架版本

Transformers 4.27.4
Pytorch 2.0.0+cu118
Datasets 2.11.0
Tokenizers 0.13.3

作者:

Hanif Noer Rofiq

数据集大小:

475.78 MB