模型:
hakonmh/sentiment-xdistil-uncased
Sentiment-xDistil是基于 xtremedistil-l12-h384-uncased 进行微调的模型,用于对由 Chat GPT 3.5 注释的新闻标题进行情感分类。它与 Topic-xDistil 一起构建,作为过滤金融新闻标题并分类其情感的工具。用于训练这两个模型和构建数据集的代码可以在 here 找到。
注意:输出标签可以是负面、中性或正面。该模型适用于英语。
以下是测试集上两个模型的性能指标:
Model | Test Set Size | Accuracy | F1 Score |
---|---|---|---|
topic-xdistil-uncased | 32 799 | 94.44 % | 92.59 % |
sentiment-xdistil-uncased | 17 527 | 94.59 % | 93.44 % |
训练数据包括300k+的新闻标题和推文,由 Chat GPT 3.5 注释,已经显示出 outperform crowd-workers for text annotation tasks 。
Chat GPT提示定义了句子标签如下:
""" [...] Does the headline convey a Positive, Neutral, or Negative sentiment with \ regard to the current state or potential future impact on the economy or \ the asset described? - Positive sentiment headlines suggest growth, improvement, or \ stability in economic conditions. - Neutral sentiment headlines do not clearly indicate a positive or \ negative impact on the economy. - Negative sentiment headlines imply economic decline, uncertainty, \ or unfavorable conditions. [...] """
这里是一个简单的示例:
from transformers import AutoTokenizer, AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained("hakonmh/sentiment-xdistil-uncased") tokenizer = AutoTokenizer.from_pretrained("hakonmh/sentiment-xdistil-uncased") SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!" inputs = tokenizer(SENTENCE, return_tensors="pt") output = model(**inputs).logits predicted_label = model.config.id2label[output.argmax(-1).item()] print(predicted_label)
Positive
或者,与Topic-xDistil一起作为一个流水线:
from transformers import pipeline topic_classifier = pipeline("sentiment-analysis", model="hakonmh/topic-xdistil-uncased", tokenizer="hakonmh/topic-xdistil-uncased") sentiment_classifier = pipeline("sentiment-analysis", model="hakonmh/sentiment-xdistil-uncased", tokenizer="hakonmh/sentiment-xdistil-uncased") SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!" print(topic_classifier(SENTENCE)) print(sentiment_classifier(SENTENCE))
[{'label': 'Economics', 'score': 0.9970171451568604}] [{'label': 'Positive', 'score': 0.9997037053108215}]