模型:
cardiffnlp/xlm-twitter-politics-sentiment
这是多语言twitter-xlm-roberta-base-sentiment模型( model , original paper )的“扩展”,重点关注政治家推文的情感。原始情感微调是在8种语言(阿拉伯语,英语,法语,德语,印地语,意大利语,西班牙语,葡萄牙语)上进行的,但是后来使用来自英国议会成员(英语),西班牙议会成员(西班牙语)和希腊议会成员(希腊语)的推文进行了进一步的训练。
from transformers import AutoModelForSequenceClassification from transformers import TFAutoModelForSequenceClassification from transformers import AutoTokenizer import numpy as np from scipy.special import softmax MODEL = f"cardiffnlp/xlm-twitter-politics-sentiment" tokenizer = AutoTokenizer.from_pretrained(MODEL) # PT model = AutoModelForSequenceClassification.from_pretrained(MODEL) text = "Good night ?" text = preprocess(text) encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input) scores = output[0][0].detach().numpy() scores = softmax(scores) # # TF # model = TFAutoModelForSequenceClassification.from_pretrained(MODEL) # model.save_pretrained(MODEL) # text = "Good night ?" # encoded_input = tokenizer(text, return_tensors='tf') # output = model(encoded_input) # scores = output[0][0].numpy() # scores = softmax(scores) # Print labels and scores ranking = np.argsort(scores) for i in range(scores.shape[0]): s = scores[ranking[i]] print(i, s)
输出:
0 0.0048229103 1 0.03117284 2 0.9640044