英文

XLM-T-Sent-Politics

这是多语言twitter-xlm-roberta-base-sentiment模型( model original paper )的“扩展”,重点关注政治家推文的情感。原始情感微调是在8种语言(阿拉伯语,英语,法语,德语,印地语,意大利语,西班牙语,葡萄牙语)上进行的,但是后来使用来自英国议会成员(英语),西班牙议会成员(西班牙语)和希腊议会成员(希腊语)的推文进行了进一步的训练。

完整的分类示例

from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
from transformers import AutoTokenizer
import numpy as np
from scipy.special import softmax

MODEL = f"cardiffnlp/xlm-twitter-politics-sentiment"

tokenizer = AutoTokenizer.from_pretrained(MODEL)

# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

text = "Good night ?"
text = preprocess(text)
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)

# # TF
# model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
# model.save_pretrained(MODEL)

# text = "Good night ?"
# encoded_input = tokenizer(text, return_tensors='tf')
# output = model(encoded_input)
# scores = output[0][0].numpy()
# scores = softmax(scores)

# Print labels and scores
ranking = np.argsort(scores)
for i in range(scores.shape[0]):
    s = scores[ranking[i]]
    print(i, s)

输出:

0 0.0048229103
1 0.03117284
2 0.9640044