数据集:
amazon_polarity
任务:
文本分类语言:
en计算机处理:
monolingual大小:
1M<n<10M语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1509.01626许可:
apache-2.0Amazon评论数据集包含来自亚马逊的评论。数据跨越了18年的时间,包括截至2013年3月的大约3500万条评论。评论包括产品和用户信息、评分和明文评论。
主要为英语。
典型的数据点由标题、内容和相应标签组成。
Amazon极性测试集的一个示例如下:
{ 'title':'Great CD', 'content':"My lovely Pat has one of the GREAT voices of her generation. I have listened to this CD for YEARS and I still LOVE IT. When I'm in a good mood it makes me feel better. A bad mood just evaporates like sugar in the rain. This CD just oozes LIFE. Vocals are jusat STUUNNING and lyrics just kill. One of life's hidden gems. This is a desert isle CD in my book. Why she never made it big is just beyond me. Everytime I play this, no matter black, white, young, old, male, female EVERYBODY says one thing ""Who was that singing ?""", 'label':1 }
Amazon评论极性数据集通过将评分1和2视为负面,将评分4和5视为正面来构建。忽略评分3的样本。每个类别有180万个训练样本和20万个测试样本。
Amazon评论极性数据集由Xiang Zhang (xiang.zhang@nyu.edu) 构建。它被用作以下论文中的文本分类基准:Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015)。
[需要更多信息]
源语言制作者是谁?[需要更多信息]
[需要更多信息]
注释者是谁?[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
[需要更多信息]
Apache License 2.0
McAuley, Julian, and Jure Leskovec. "Hidden factors and hidden topics: understanding rating dimensions with review text." In Proceedings of the 7th ACM conference on Recommender systems, pp. 165-172. 2013.
Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015)
感谢 @hfawaz 添加了该数据集。