英文

从roberta-large微调的模型,用于对金融新闻进行主题分类(重点关注加拿大新闻)。

简介

该模型是在financial_news_sentiment_mixte_with_phrasebank_75数据集的主题列上进行训练的。主题列是使用零-shot分类模型生成的,共有11个主题。对生成的主题没有进行手动审核,因此我们应该预期数据集中会有错误分类,因此训练的模型可能会重复出现相同的错误。

训练数据

训练数据按以下方式分类:

class Description
0 acquisition
1 other
2 quaterly financial release
3 appointment to new position
4 dividend
5 corporate update
6 drillings results
7 conference
8 share repurchase program
9 grant of stocks

如何使用roberta-large-financial-news-topics-en with HuggingFace

加载roberta-large-financial-news-topics-en及其子词分词器:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("Jean-Baptiste/roberta-large-financial-news-topics-en")
model = AutoModelForSequenceClassification.from_pretrained("Jean-Baptiste/roberta-large-financial-news-topics-en")


##### Process text sample (from wikipedia)

from transformers import pipeline

pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
pipe("Melcor REIT (TSX: MR.UN) today announced results for the third quarter ended September 30, 2022. Revenue was stable in the quarter and year-to-date. Net operating income was down 3% in the quarter at $11.61 million due to the timing of operating expenses and inflated costs including utilities like gas/heat and power")

[{'label': 'quaterly financial release', 'score': 0.8829097151756287}]

模型性能

总体f1分数(平均宏平均)

precision recall f1
0.7533 0.7629 0.7499