模型:
Davlan/naija-twitter-sentiment-afriberta-large
语言:
naija-twitter-sentiment-afriberta-large是第一个多语种推特情感分类模型,适用于四种尼日利亚语言(哈萨语、伊博语、Nigerian Pidgin和约鲁巴语),基于fine-tuned castorini/afriberta_large大模型。该模型在 NaijaSenti corpus 个样本上训练,取得了推特情感分类任务的最新成果。该模型已经过训练,能够将推文分类为负面、中性和积极的情感类别。具体而言,该模型是一个xlm-roberta-large模型,它是在从 NaijaSenti 个数据集中汇总的4种尼日利亚语言数据集上进行fine-tuning的。
您可以使用这个模型进行情感分类。
from transformers import AutoModelForSequenceClassification from transformers import AutoTokenizer import numpy as np from scipy.special import softmax MODEL = "Davlan/naija-twitter-sentiment-afriberta-large" tokenizer = AutoTokenizer.from_pretrained(MODEL) # PT model = AutoModelForSequenceClassification.from_pretrained(MODEL) text = "I like you" encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input) scores = output[0][0].detach().numpy() scores = softmax(scores) id2label = {0:"positive", 1:"neutral", 2:"negative"} ranking = np.argsort(scores) ranking = ranking[::-1] for i in range(scores.shape[0]): l = id2label[ranking[i]] s = scores[ranking[i]] print(f"{i+1}) {l} {np.round(float(s), 4)}")限制和偏见
该模型受到其训练数据集和领域(即Twitter)的限制。这可能无法很好地适用于不同领域的所有用例。
该模型在一台Nvidia RTX 2080 GPU上使用来自 original NaijaSenti paper 的推荐超参数进行训练。
language | F1-score |
---|---|
hau | 81.2 |
ibo | 80.8 |
pcm | 74.5 |
yor | 80.4 |
@inproceedings{Muhammad2022NaijaSentiAN, title={NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis}, author={Shamsuddeen Hassan Muhammad and David Ifeoluwa Adelani and Sebastian Ruder and Ibrahim Said Ahmad and Idris Abdulmumin and Bello Shehu Bello and Monojit Choudhury and Chris C. Emezue and Saheed Salahudeen Abdullahi and Anuoluwapo Aremu and Alipio Jeorge and Pavel B. Brazdil}, year={2022} }