arxiv:1609.07843这是 https://huggingface.co/datasets/wikitext 的修改版本,返回维基百科页面而不是逐行返回维基文本。原始的readme包含在下面。
This example was too long and was cropped: { "text": "\" The gold dollar or gold one @-@ dollar piece was a coin struck as a regular issue by the United States Bureau of the Mint from..." }wikitext-103-v1
This example was too long and was cropped: { "text": "\" Senjō no Valkyria 3 : <unk> Chronicles ( Japanese : 戦場のヴァルキュリア3 , lit . Valkyria of the Battlefield 3 ) , commonly referred to..." }wikitext-2-raw-v1
This example was too long and was cropped: { "text": "\" The Sinclair Scientific Programmable was introduced in 1975 , with the same case as the Sinclair Oxford . It was larger than t..." }wikitext-2-v1
This example was too long and was cropped: { "text": "\" Senjō no Valkyria 3 : <unk> Chronicles ( Japanese : 戦場のヴァルキュリア3 , lit . Valkyria of the Battlefield 3 ) , commonly referred to..." }
wikitext-103-raw-v1name | train | validation | test |
wikitext-103-raw-v1 | 1801350 | 3760 | 4358 |
wikitext-103-v1 | 1801350 | 3760 | 4358 |
wikitext-2-raw-v1 | 36718 | 3760 | 4358 |
wikitext-2-v1 | 36718 | 3760 | 4358 |
该数据集在 Creative Commons Attribution-ShareAlike License (CC BY-SA 4.0) 下可用。
@misc{merity2016pointer, title={Pointer Sentinel Mixture Models}, author={Stephen Merity and Caiming Xiong and James Bradbury and Richard Socher}, year={2016}, eprint={1609.07843}, archivePrefix={arXiv}, primaryClass={cs.CL} }
感谢 @thomwolf 、 @lewtun 、 @patrickvonplaten 、 @mariamabarham 添加了该数据集。