数据集:
DFKI-SLT/few-nerd
任务:
语言:
计算机处理:
monolingual大小:
100K<n<1M语言创建人:
found批注创建人:
expert-generated源数据集:
extended|wikipedia许可:
This script is for loading the Few-NERD dataset from https://ningding97.github.io/fewnerd/ .
Few-NERD is a large-scale, fine-grained manually annotated named entity recognition dataset, which contains 8 coarse-grained types, 66 fine-grained types, 188,200 sentences, 491,711 entities, and 4,601,223 tokens. Three benchmark tasks are built, one is supervised (Few-NERD (SUP)) and the other two are few-shot (Few-NERD (INTRA) and Few-NERD (INTER)).
NER tags use the IO tagging scheme. The original data uses a 2-column CoNLL-style format, with empty lines to separate sentences. DOCSTART information is not provided since the sentences are randomly ordered.
For more details see https://ningding97.github.io/fewnerd/ and https://aclanthology.org/2021.acl-long.248/ .
English
Size of downloaded dataset files:
Size of the generated dataset:
Total amount of disk used: 366.8 MB
An example of 'train' looks as follows.
{
'id': '1',
'tokens': ['It', 'starred', 'Hicks', "'s", 'wife', ',', 'Ellaline', 'Terriss', 'and', 'Edmund', 'Payne', '.'],
'ner_tags': [0, 0, 7, 0, 0, 0, 7, 7, 0, 7, 7, 0],
'fine_ner_tags': [0, 0, 51, 0, 0, 0, 50, 50, 0, 50, 50, 0]
}
The data fields are the same among all splits.
| Task | Train | Dev | Test |
|---|---|---|---|
| SUP | 131767 | 18824 | 37648 |
| INTRA | 99519 | 19358 | 44059 |
| INTER | 130112 | 18817 | 14007 |
@inproceedings{ding-etal-2021-nerd,
title = "Few-{NERD}: A Few-shot Named Entity Recognition Dataset",
author = "Ding, Ning and
Xu, Guangwei and
Chen, Yulin and
Wang, Xiaobin and
Han, Xu and
Xie, Pengjun and
Zheng, Haitao and
Liu, Zhiyuan",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.248",
doi = "10.18653/v1/2021.acl-long.248",
pages = "3198--3213",
}