数据集:

FreedomIntelligence/huatuo_consultation_qa

语言:

zh

大小:

1M<n<10M

预印本库:

arxiv:2305.01526

其他:

medical

许可:

apache-2.0
中文

Dataset Card for huatuo_consultation_qa

Dataset Summary

We collected data from a website for medical consultation , consisting of many online consultation records by medical experts. Each record is a QA pair: a patient raises a question and a medical doctor answers the question. The basic information of doctors (including name, hospital organization, and department) was recorded.

We directly crawl patient’s questions and doctor’s answers as QA pairs, getting 32,708,346 pairs. Subsequently, we removed the QA pairs containing special characters and removed the repeated pairs. Finally, we got 25,341,578 QA pairs.

Please note that for some reasons we cannot directly provide text data, so the answer part of our data set is url. If you want to use text data, you can refer to the other two parts of our open source datasets ( huatuo_encyclopedia_qa huatuo_knowledge_graph_qa ), or use url for data collection.

Dataset Creation

Source Data

....

Citation

@misc{li2023huatuo26m,
      title={Huatuo-26M, a Large-scale Chinese Medical QA Dataset}, 
      author={Jianquan Li and Xidong Wang and Xiangbo Wu and Zhiyi Zhang and Xiaolong Xu and Jie Fu and Prayag Tiwari and Xiang Wan and Benyou Wang},
      year={2023},
      eprint={2305.01526},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}