数据集:

Thaweewat/hc3-24k-th

中文

Summary

This is a ?? Thai-instructed dataset translated using Google Cloud Translation from HC3 ( Included total 24K , 17K reddit_eli5, 4K finance, 1.2K medicine, 1.2K open_qa and 0.8K wiki_csai )

The first human-ChatGPT comparison corpus which is introduced in this paper:

Code, models and analysis are available on GitHub:

Supported Tasks:

  • Training LLMs
  • Synthetic Data Generation
  • Data Augmentation

Languages: Thai Version: 1.0