roberta-ticker：该模型是从罗伯塔（Roberta）微调而来，用于检测金融代码

简介

这是一个专门设计用于识别文本中金融代码的模型。该模型是在以下Kaggle数据集： https://www.kaggle.com/omermetinn/tweets-about-the-top-companies-from-2015-to-2020 的基础上训练的。

如何使用roberta-ticker和HuggingFace

加载roberta-ticker及其子词分词器：

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("Jean-Baptiste/roberta-ticker")
model = AutoModelForTokenClassification.from_pretrained("Jean-Baptiste/roberta-ticker")


##### Process text sample 

from transformers import pipeline

nlp = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")

nlp("I am going to buy 100 shares of cake tomorrow")
[{'entity_group': 'TICKER',
  'score': 0.9612462520599365,
  'word': ' cake',
  'start': 32,
  'end': 36}]
  
nlp("I am going to eat a cake tomorrow")
[]

模型性能

precision: 0.914157
recall: 0.788824
f1: 0.846878

作者:

JB Polle

数据集大小:

947.79 MB