模型:
pysentimiento/robertuito-ner
存储库: https://github.com/pysentimiento/pysentimiento/
该模型是使用 LinCE NER corpus 中的西班牙语/英语数据集训练的,该数据集是一种混合代码切换基准。基础模型是使用西班牙推文训练的 RoBERTa 模型 RoBERTuito 。
如果您想使用该模型,我们建议直接从 pysentimiento 库中使用,因为它在管道工作中由于标记化问题而无法正常使用。
from pysentimiento import create_analyzer ner_analyzer = create_analyzer("ner", lang="es") ner_analyzer.predict( "rindanse ante el mejor, leonel andres messi cuccitini. serresiete no existis, segui en al-nassr" ) # [{'type': 'PER', # 'text': 'leonel andres messi cuccitini', # 'start': 24, # 'end': 53}, # {'type': 'PER', 'text': 'serresiete', 'start': 55, 'end': 65}, # {'type': 'LOC', 'text': 'al-nassr', 'start': 108, 'end': 116}]
结果取自 LinCE 排行榜。
Model | Sentiment | NER | POS |
---|---|---|---|
RoBERTuito | 60.6 | 68.5 | 97.2 |
XLM Large | -- | 69.5 | 97.2 |
XLM Base | -- | 64.9 | 97.0 |
C2S mBERT | 59.1 | 64.6 | 96.9 |
mBERT | 56.4 | 64.0 | 97.1 |
BERT | 58.4 | 61.1 | 96.9 |
BETO | 56.5 | -- | -- |
如果您在研究中使用了该模型,请引用 pysentimiento、RoBERTuito 和 LinCE 论文:
@misc{perez2021pysentimiento, title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks}, author={Juan Manuel Pérez and Juan Carlos Giudici and Franco Luque}, year={2021}, eprint={2106.09462}, archivePrefix={arXiv}, primaryClass={cs.CL} } @inproceedings{perez2022robertuito, title={RoBERTuito: a pre-trained language model for social media text in Spanish}, author={P{\'e}rez, Juan Manuel and Furman, Dami{\'a}n Ariel and Alemany, Laura Alonso and Luque, Franco M}, booktitle={Proceedings of the Thirteenth Language Resources and Evaluation Conference}, pages={7235--7243}, year={2022} } @inproceedings{aguilar2020lince, title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation}, author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar}, booktitle={Proceedings of the 12th Language Resources and Evaluation Conference}, pages={1803--1813}, year={2020} }