数据集:

semaj83/ctmatch_classification

大小:

10K<n<100K

其他:

medical

许可:

mit
中文

CTMatch Classification Dataset

This is a combined set of 2 labelled datasets of:

topic (patient descriptions), doc (clinical trials documents - selected fields), and label ({0, 1, 2}) triples, in jsonl format.

(Somewhat of a duplication of some of the ir_dataset also available on HF.)

These have been processed using ctproc, and in this state can be used by various tokenizers for fine-tuning (see ctmatch for examples).

These 2 datasets contain no patient identifying information are openly available in raw forms:

TREC: http://www.trec-cds.org/2021.html CSIRO: https://data.csiro.au/collection/csiro:17152

see repo for more information : https://github.com/semajyllek/ctmatch