数据集:
swedish_ner_corpus
任务:
标记分类语言:
sv计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
expert-generated源数据集:
original许可:
cc-by-4.0Webbnyheter 2012 from Spraakbanken, semi-manually annotated and adapted for CoreNLP Swedish NER. Semi-manually defined in this case as: Bootstrapped from Swedish Gazetters then manually correcte/reviewed by two independent native speaking swedish annotators. No annotator agreement calculated.
[More Information Needed]
Swedish
A sample dataset instance is provided below:
{'id': '3', 'ner_tags': [4, 4, 0, 0, 0, 0, 0, 0, 3, 3, 0], 'tokens': ['Margaretha', 'Fahlgren', ',', 'professor', 'i', 'litteraturvetenskap', ',', 'vice-rektor', 'Uppsala', 'universitet', '.']}
Full fields:
{ "id":{ "feature_type":"Value" "dtype":"string" } "tokens":{ "feature_type":"Sequence" "feature":{ "feature_type":"Value" "dtype":"string" } } "ner_tags":{ "feature_type":"Sequence" "dtype":"int32" "feature":{ "feature_type":"ClassLabel" "dtype":"int32" "class_names":[ 0:"0" 1:"LOC" 2:"MISC" 3:"ORG" 4:"PER" ] } } }
[More Information Needed]
[More Information Needed]
[More Information Needed]
Initial Data Collection and Normalization[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Annotation process[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
The original dataset was provided by Språkbanken which consists of news from Swedish newspapers' websites.
https://github.com/klintan/swedish-ner-corpus/blob/master/LICENSE
[More Information Needed]
Thanks to @abhishekkrthakur for adding this dataset.