数据集:

pszemraj/dolly_hhrlhf-text2text

中文

dolly_hhrlhf-text2text

This is mosaicml/dolly_hhrlhf with the following changes:

  • clean up/adapt prompt column for the text2text-generation task (no need for a special template)
  • split the original train set into a 95% train and an explicit validation set (5%)
  • fixed extra spaces in puncuation (as this is not a French dataset)

details on extra spaces:

Original sentence 1: How can I be healthy ?
Fixed sentence 1: How can I be healthy?