模型:

hakonmh/topic-xdistil-uncased

英文

Topic-xDistil是基于 xtremedistil-l12-h384-uncased 进行微调的模型,用于对由 Chat GPT 3.5 注释的新闻标题的主题进行分类。它与 Sentiment-xDistil 一起构建,作为过滤财经新闻标题和分类情感的工具。用于训练这两个模型和构建数据集的代码可在 here 找到。

注:

输出标签要么是经济学,要么是其他。此模型适用于英语。

性能结果

以下是测试集上两个模型的性能指标:

Model Test Set Size Accuracy F1 Score
topic-xdistil-uncased 32 799 94.44 % 92.59 %
sentiment-xdistil-uncased 17 527 94.59 % 93.44 %

数据

训练数据包括约600k条新闻标题和推文,并由 Chat GPT 3.5 进行注释,表明其 outperform crowd-workers for text annotation tasks

Chat GPT提示定义了句子标签如下:

"""
[...]
    - Economic headlines generally cover topics such as financial markets, \
 business, financial assets, trade, employment, GDP, inflation, or fiscal \
and monetary policy.
    - Non-economic headlines might include sports, entertainment, politics, \
science, weather, health, or other unrelated news events.
[...]
"""

使用示例

以下是一个简单示例:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("hakonmh/topic-xdistil-uncased")
tokenizer = AutoTokenizer.from_pretrained("hakonmh/topic-xdistil-uncased")

SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!"
inputs = tokenizer(SENTENCE, return_tensors="pt")
output = model(**inputs).logits
predicted_label = model.config.id2label[output.argmax(-1).item()]

print(predicted_label)
Economics

或者,与Sentiment-xDistil一起使用:

from transformers import pipeline

topic_classifier = pipeline("sentiment-analysis",
                            model="hakonmh/topic-xdistil-uncased",
                            tokenizer="hakonmh/topic-xdistil-uncased")
sentiment_classifier = pipeline("sentiment-analysis",
                                model="hakonmh/sentiment-xdistil-uncased",
                                tokenizer="hakonmh/sentiment-xdistil-uncased")

SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!"
print(topic_classifier(SENTENCE))
print(sentiment_classifier(SENTENCE))
[{'label': 'Economics', 'score': 0.9970171451568604}]
[{'label': 'Positive', 'score': 0.9997037053108215}]