模型:
pysentimiento/robertuito-pos
存储库: https://github.com/pysentimiento/pysentimiento/
使用西班牙语推文训练的 RoBERTa 模型,对 LinCE NER corpus 中的西班牙语/英语混合数据集进行训练。基准模型是 RoBERTuito ,一个在西班牙语推文上训练的 RoBERTa 模型。
如果您想使用这个模型,我们建议您直接从 pysentimiento 库中使用,因为它在 pipeline 中由于标记化问题而无法正常工作。
from pysentimiento import create_analyzer pos_analyzer = create_analyzer("pos", lang="es") pos_analyzer.predict("Quiero que esto funcione correctamente! @perezjotaeme") >[{'type': 'PROPN', 'text': 'Quiero', 'start': 0, 'end': 6}, > {'type': 'SCONJ', 'text': 'que', 'start': 7, 'end': 10}, > {'type': 'PRON', 'text': 'esto', 'start': 11, 'end': 15}, > {'type': 'VERB', 'text': 'funcione', 'start': 16, 'end': 24}, > {'type': 'ADV', 'text': 'correctamente', 'start': 25, 'end': 38}, > {'type': 'PUNCT', 'text': '!', 'start': 38, 'end': 39}, > {'type': 'NOUN', 'text': '@perezjotaeme', 'start': 40, 'end': 53}]
结果来自 LinCE 排行榜
Model | Sentiment | NER | POS |
---|---|---|---|
RoBERTuito | 60.6 | 68.5 | 97.2 |
XLM Large | -- | 69.5 | 97.2 |
XLM Base | -- | 64.9 | 97.0 |
C2S mBERT | 59.1 | 64.6 | 96.9 |
mBERT | 56.4 | 64.0 | 97.1 |
BERT | 58.4 | 61.1 | 96.9 |
BETO | 56.5 | -- | -- |
如果您在研究中使用了这个模型,请引用 pysentimiento、RoBERTuito 和 LinCE 论文:
@misc{perez2021pysentimiento, title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks}, author={Juan Manuel Pérez and Juan Carlos Giudici and Franco Luque}, year={2021}, eprint={2106.09462}, archivePrefix={arXiv}, primaryClass={cs.CL} } @inproceedings{ortega2019overview, title={Overview of the task on irony detection in Spanish variants}, author={Ortega-Bueno, Reynier and Rangel, Francisco and Hern{\'a}ndez Far{\i}as, D and Rosso, Paolo and Montes-y-G{\'o}mez, Manuel and Medina Pagola, Jos{\'e} E}, booktitle={Proceedings of the Iberian languages evaluation forum (IberLEF 2019), co-located with 34th conference of the Spanish Society for natural language processing (SEPLN 2019). CEUR-WS. org}, volume={2421}, pages={229--256}, year={2019} } @inproceedings{aguilar2020lince, title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation}, author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar}, booktitle={Proceedings of the 12th Language Resources and Evaluation Conference}, pages={1803--1813}, year={2020} }