数据集:
yelp_review_full
任务:
文本分类语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1509.01626许可:
otherYelp评论数据集包含了来自Yelp的评论数据。该数据集是从Yelp Dataset Challenge 2015数据中提取出来的。
这些评论主要是用英语书写的。
一个典型的数据点包括一个文本和相应的标签。
YelpReviewFull测试集中的一个示例如下:
{ 'label': 0, 'text': 'I got \'new\' tires from them and within two weeks got a flat. I took my car to a local mechanic to see if i could get the hole patched, but they said the reason I had a flat was because the previous patch had blown - WAIT, WHAT? I just got the tire and never needed to have it patched? This was supposed to be a new tire. \\nI took the tire over to Flynn\'s and they told me that someone punctured my tire, then tried to patch it. So there are resentful tire slashers? I find that very unlikely. After arguing with the guy and telling him that his logic was far fetched he said he\'d give me a new tire \\"this time\\". \\nI will never go back to Flynn\'s b/c of the way this guy treated me and the simple fact that they gave me a used tire!' }
Yelp评论完整星级数据集是通过随机选择每个评分从1到5的130,000个训练样本和10,000个测试样本构成的。总共有650,000个训练样本和50,000个测试样本。
Yelp评论完整星级数据集是由Xiang Zhang(xiang.zhang@nyu.edu)从Yelp Dataset Challenge 2015中构建的。首次在以下论文中用作文本分类基准:Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28(NIPS 2015)。
[需要更多信息]
谁是源语言的生成者?[需要更多信息]
[需要更多信息]
谁是注释者?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
您可以检查官方 yelp-dataset-agreement 。
Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28(NIPS 2015)。
感谢 @hfawaz 添加了该数据集。