数据集:
persian_ner
任务:
标记分类语言:
fa计算机处理:
monolingual大小:
1K<n<10K语言创建人:
expert-generated批注创建人:
expert-generated源数据集:
original许可:
cc-by-4.0该数据集包括7,682个波斯语句子,分为250,015个标记和它们的NER标签。它可分成3个折叠,轮流用作训练集和测试集。NER标签采用IOB格式。
[需要更多信息]
[需要更多信息]
NER标签对应于以下列表:
"O", "I-event", "I-fac", "I-loc", "I-org", "I-pers", "I-pro", "B-event", "B-fac", "B-loc", "B-org", "B-pers", "B-pro"
训练和测试拆分
[需要更多信息]
[需要更多信息]
谁是源语言制作人?Hanieh Poostchi,Ehsan Zare Borzeshi,Mohammad Abdous,Massimo Piccardi
[需要更多信息]
谁是注释者?Hanieh Poostchi,Ehsan Zare Borzeshi,Mohammad Abdous,Massimo Piccardi
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
该数据集仅用于学术用途
[需要更多信息]
Creative Commons Attribution 4.0 International License.
@inproceedings{poostchi-etal-2016-personer, title = "{P}erso{NER}: {P}ersian Named-Entity Recognition", author = "Poostchi, Hanieh and Zare Borzeshi, Ehsan and Abdous, Mohammad and Piccardi, Massimo", booktitle = "Proceedings of {COLING} 2016, the 26th International Conference on Computational Linguistics: Technical Papers", month = dec, year = "2016", address = "Osaka, Japan", publisher = "The COLING 2016 Organizing Committee", url = " https://www.aclweb.org/anthology/C16-1319" , pages = "3381--3389", abstract = "Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present and provide ArmanPerosNERCorpus, the first manually-annotated Persian NER corpus. Then, we introduce PersoNER, an NER pipeline for Persian that leverages a word embedding and a sequential max-margin classifier. The experimental results show that the proposed approach is capable of achieving interesting MUC7 and CoNNL scores while outperforming two alternatives based on a CRF and a recurrent neural network.",}
感谢 @KMFODA 添加了这个数据集。