数据集:

turkish_shrinked_ner

任务:

标记分类

子任务:

named-entity-recognition

语言:

计算机处理:

monolingual

大小:

100K<n<1M

语言创建人:

expert-generated

批注创建人:

machine-generated

源数据集:

extended|other-turkish_ner

许可:

cc-by-4.0

数据集介绍文件清单

中文

Dataset Card for turkish_shrinked_ner

Dataset Summary

Shrinked processed version (48 entity type) of the turkish_ner.

Original turkish_ner dataset: Automatically annotated Turkish corpus for named entity recognition and text categorization using large-scale gazetteers. The constructed gazetteers contains approximately 300K entities with thousands of fine-grained entity types under 25 different domains.

Shrinked entity types are: academic, academic_person, aircraft, album_person, anatomy, animal, architect_person, capital, chemical, clothes, country, culture, currency, date, food, genre, government, government_person, language, location, material, measure, medical, military, military_person, nation, newspaper, organization, organization_person, person, production_art_music, production_art_music_person, quantity, religion, science, shape, ship, software, space, space_person, sport, sport_name, sport_person, structure, subject, tech, train, vehicle

Supported Tasks and Leaderboards

[Needs More Information]

Languages

Turkish

Dataset Structure

Data Instances

[Needs More Information]

Data Fields

[Needs More Information]

Data Splits

There's only the training set.

Dataset Creation

Curation Rationale

[Needs More Information]

Source Data

Initial Data Collection and Normalization

[Needs More Information]

Who are the source language producers?

[Needs More Information]

Annotations

Annotation process

[Needs More Information]

Who are the annotators?

[Needs More Information]

Personal and Sensitive Information

[Needs More Information]

Considerations for Using the Data

Social Impact of Dataset

[Needs More Information]

Discussion of Biases

[Needs More Information]

Other Known Limitations

[Needs More Information]

Additional Information

Dataset Curators

Behcet Senturk

Licensing Information

Creative Commons Attribution 4.0 International

Citation Information

[Needs More Information]

Contributions

Thanks to @bhctsntrk for adding this dataset.

作者:

佚名

数据集大小:

20.77 KB