模型:
Jean-Baptiste/roberta-large-financial-news-sentiment-en
该模型是在financial_news_sentiment_mixte_with_phrasebank_75数据集上训练的。这是一个经过定制的phrasebank数据集的版本,只保留了至少75%注释者验证的句子。此外,我还手动添加了大约2000篇加拿大金融新闻的验证文章,因此该模型更专门针对加拿大新闻进行训练。最终结果为整体f1得分93.25%,加拿大新闻为83.6%。
训练数据分类如下:
class | Description |
---|---|
0 | negative |
1 | neutral |
2 | positive |
from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Jean-Baptiste/roberta-large-financial-news-sentiment-en") model = AutoModelForSequenceClassification.from_pretrained("Jean-Baptiste/roberta-large-financial-news-sentiment-en") ##### Process text sample (from wikipedia) from transformers import pipeline pipe = pipeline("text-classification", model=model, tokenizer=tokenizer) pipe("Melcor REIT (TSX: MR.UN) today announced results for the third quarter ended September 30, 2022. Revenue was stable in the quarter and year-to-date. Net operating income was down 3% in the quarter at $11.61 million due to the timing of operating expenses and inflated costs including utilities like gas/heat and power") [{'label': 'negative', 'score': 0.9399105906486511}]
整体f1得分(平均宏观)
precision | recall | f1 |
---|---|---|
0.9355 | 0.9299 | 0.9325 |
按实体划分
entity | precision | recall | f1 |
---|---|---|---|
negative | 0.9605 | 0.9240 | 0.9419 |
neutral | 0.9538 | 0.9459 | 0.9498 |
positive | 0.8922 | 0.9200 | 0.9059 |