数据集:
cbt
子任务:
multiple-choice-qa语言:
en计算机处理:
monolingual语言创建人:
found批注创建人:
machine-generated源数据集:
original预印本库:
arxiv:1511.02301许可:
gfdl儿童读本测试(CBT)旨在直接衡量语言模型如何利用更广泛的语言上下文。CBT是由免费提供的书籍构建的。
该数据集包含四种不同的配置:
[需要更多信息]
数据以英文形式存在,由作者Lucy Maud Montgomery、Charles Dickens、Andrew Lang等创作的儿童故事书中的文本组成。
V配置的一个实例:
{'answer': 'said', 'options': ['christening', 'existed', 'hear', 'knows', 'read', 'remarked', 'said', 'sitting', 'talking', 'wearing'], 'question': "`` They are very kind old ladies in their way , '' XXXXX the king ; `` and were nice to me when I was a boy . ''", 'sentences': ['This vexed the king even more than the queen , who was very clever and learned , and who had hated dolls when she was a child .', 'However , she , too in spite of all the books she read and all the pictures she painted , would have been glad enough to be the mother of a little prince .', 'The king was anxious to consult the fairies , but the queen would not hear of such a thing .', 'She did not believe in fairies : she said that they had never existed ; and that she maintained , though The History of the Royal Family was full of chapters about nothing else .', 'Well , at long and at last they had a little boy , who was generally regarded as the finest baby that had ever been seen .', 'Even her majesty herself remarked that , though she could never believe all the courtiers told her , yet he certainly was a fine child -- a very fine child .', 'Now , the time drew near for the christening party , and the king and queen were sitting at breakfast in their summer parlour talking over it .', 'It was a splendid room , hung with portraits of the royal ancestors .', 'There was Cinderella , the grandmother of the reigning monarch , with her little foot in her glass slipper thrust out before her .', 'There was the Marquis de Carabas , who , as everyone knows , was raised to the throne as prince consort after his marriage with the daughter of the king of the period .', 'On the arm of the throne was seated his celebrated cat , wearing boots .', 'There , too , was a portrait of a beautiful lady , sound asleep : this was Madame La Belle au Bois-dormant , also an ancestress of the royal family .', 'Many other pictures of celebrated persons were hanging on the walls .', "`` You have asked all the right people , my dear ? ''", 'said the king .', "`` Everyone who should be asked , '' answered the queen .", "`` People are so touchy on these occasions , '' said his majesty .", "`` You have not forgotten any of our aunts ? ''", "`` No ; the old cats ! ''", "replied the queen ; for the king 's aunts were old-fashioned , and did not approve of her , and she knew it ."]}
对于raw配置,数据字段如下:
对于其他所有配置,数据字段如下:
拆分及其对应的大小如下:
train | test | validation | |
---|---|---|---|
raw | 98 | 5 | 5 |
V | 105825 | 2500 | 2000 |
P | 334030 | 2500 | 2000 |
CN | 120769 | 2500 | 2000 |
NE | 108719 | 2500 | 2000 |
[需要更多信息]
[需要更多信息]
源语言制片人是谁?儿童读本作者
[需要更多信息]
注释者是谁?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
GNU Free Documentation License v1.3
@misc{hill2016goldilocks, title={The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations}, author={Felix Hill and Antoine Bordes and Sumit Chopra and Jason Weston}, year={2016}, eprint={1511.02301}, archivePrefix={arXiv}, primaryClass={cs.CL} }
感谢 @gchhablani 添加了此数据集。