命名实体识别模型（Spanish/English）

robertuito-ner

存储库： https://github.com/pysentimiento/pysentimiento/

该模型是使用 LinCE NER corpus 中的西班牙语/英语数据集训练的，该数据集是一种混合代码切换基准。基础模型是使用西班牙推文训练的 RoBERTa 模型 RoBERTuito 。

用法

如果您想使用该模型，我们建议直接从 pysentimiento 库中使用，因为它在管道工作中由于标记化问题而无法正常使用。

from pysentimiento import create_analyzer

ner_analyzer = create_analyzer("ner", lang="es")

ner_analyzer.predict(
  "rindanse ante el mejor, leonel andres messi cuccitini. serresiete no existis, segui en al-nassr"
)
 

# [{'type': 'PER',
#   'text': 'leonel andres messi cuccitini',
#   'start': 24,
#   'end': 53},
#  {'type': 'PER', 'text': 'serresiete', 'start': 55, 'end': 65},
#  {'type': 'LOC', 'text': 'al-nassr', 'start': 108, 'end': 116}]

结果

结果取自 LinCE 排行榜。

Model	Sentiment	NER	POS
RoBERTuito	60.6	68.5	97.2
XLM Large	--	69.5	97.2
XLM Base	--	64.9	97.0
C2S mBERT	59.1	64.6	96.9
mBERT	56.4	64.0	97.1
BERT	58.4	61.1	96.9
BETO	56.5	--	--

引用

如果您在研究中使用了该模型，请引用 pysentimiento、RoBERTuito 和 LinCE 论文：

@misc{perez2021pysentimiento,
      title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks},
      author={Juan Manuel Pérez and Juan Carlos Giudici and Franco Luque},
      year={2021},
      eprint={2106.09462},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@inproceedings{perez2022robertuito,
  title={RoBERTuito: a pre-trained language model for social media text in Spanish},
  author={P{\'e}rez, Juan Manuel and Furman, Dami{\'a}n Ariel and Alemany, Laura Alonso and Luque, Franco M},
  booktitle={Proceedings of the Thirteenth Language Resources and Evaluation Conference},
  pages={7235--7243},
  year={2022}
}

@inproceedings{aguilar2020lince,
  title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation},
  author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar},
  booktitle={Proceedings of the 12th Language Resources and Evaluation Conference},
  pages={1803--1813},
  year={2020}
}

作者:

pysentimiento

数据集大小:

413.71 MB