数据集:

pythainlp/thainer-corpus-v2

语言:

th

许可:

cc-by-3.0
中文

Dataset Card for "thainer-corpus-v2"

Thai Named Entity Recognition Corpus

Home Page: https://pythainlp.github.io/Thai-NER/version/2

Training script and split data: https://zenodo.org/record/7761354

You can download .conll to train named entity model in https://zenodo.org/record/7761354 .

Size

  • Train: 3,938 docs
  • Validation: 1,313 docs
  • Test: 1,313 Docs

Some data come from crowdsourcing between Dec 2018 - Nov 2019. https://github.com/wannaphong/thai-ner

Domain

  • News (It, politics, economy, social)
  • PR (KKU news)
  • general

Source

And more (the lists are lost.)

Tag

  • DATA - date
  • TIME - time
  • EMAIL - email
  • LEN - length
  • LOCATION - Location
  • ORGANIZATION - Company / Organization
  • PERSON - Person name
  • PHONE - phone number
  • TEMPERATURE - temperature
  • URL - URL
  • ZIP - Zip code
  • MONEY - the amount
  • LAW - legislation
  • PERCENT - PERCENT

Download: HuggingFace Hub

Cite

Wannaphong Phatthiyaphaibun. (2022). Thai NER 2.0 (2.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7761354

or BibTeX

@dataset{wannaphong_phatthiyaphaibun_2022_7761354,
  author       = {Wannaphong Phatthiyaphaibun},
  title        = {Thai NER 2.0},
  month        = sep,
  year         = 2022,
  publisher    = {Zenodo},
  version      = {2.0},
  doi          = {10.5281/zenodo.7761354},
  url          = {https://doi.org/10.5281/zenodo.7761354}
}