数据集:

HuggingFaceH4/oasst1_en

许可:

apache-2.0
中文

Dataset Card for oasst1_en

This dataset is a processed version of OpenAssistant's oasst1 dataset to:

  • Filter all conversations for English.
  • Group all conversation trees such that each row in the dataset corresponds to a single conversation.

See the create_dataset.py script in this repo for the processing details.

Splits

Split Description Size
train The full training split 19034
test The full test split 2115