数据集:

vicgalle/alpaca-gpt4

英文

"alpaca-gpt4" 数据集卡片

此数据集包含通过使用Alpaca提示进行GPT-4生成的英文指令跟随数据。

数据集最初在此存储库中共享: https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM 。这只是为了与huggingface的datasets库兼容而进行的包装。

数据集结构

它包含由GPT-4生成的52K个指令跟踪数据,使用与Alpaca相同的提示。数据集的格式与Alpaca数据相同,唯一不同的是输出由GPT-4生成:

- `instruction`: `str`, describes the task the model should perform. Each of the 52K instructions is unique.
- `input`: `str`, optional context or input for the task. 
- `output`: `str`, the answer to the instruction as generated by `GPT-4`.
- `text`: `str`, all the previous fields concatenated together, plus the same prompt used in Alpaca at the beginnig.

与原始Alpaca数据集的差异

原始的Alpaca数据集使用text-davinci-003来完成提示。这个数据集使用相同的提示,但是用GPT-4生成完成。因此,一般来说,回答的质量和长度更高。这是一个示例:

来自Alpaca-GPT4的示例:
{'instruction': 'Identify the odd one out.',
 'input': 'Twitter, Instagram, Telegram',
 'output': 'The odd one out is Telegram. Twitter and Instagram are social media platforms mainly for sharing information, images and videos while Telegram is a cloud-based instant messaging and voice-over-IP service.',
 'text': 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nIdentify the odd one out.\n\n### Input:\nTwitter, Instagram, Telegram\n\n### Response:\nThe odd one out is Telegram. Twitter and Instagram are social media platforms mainly for sharing information, images and videos while Telegram is a cloud-based instant messaging and voice-over-IP service.'}
原始Alpaca的相同示例:
{'instruction': 'Identify the odd one out.',
 'input': 'Twitter, Instagram, Telegram',
 'output': 'Telegram',
 'text': 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nIdentify the odd one out.\n\n### Input:\nTwitter, Instagram, Telegram\n\n### Response:\nTelegram'}

授权信息

数据集可在 Creative Commons NonCommercial (CC BY-NC 4.0) 下获得。