数据集:

fewshot-goes-multilingual/cs_mall-product-reviews

源数据集:

original

批注创建人:

found

语言创建人:

found

大小:

10K<n<100K

计算机处理:

monolingual

语言:

cs
中文

Dataset Card for Mall.cz Product Reviews (Czech)

Dataset Description

The dataset contains user reviews from Czech eshop <mall.cz> Each review contains text, sentiment (positive/negative/neutral), and automatically-detected language (mostly Czech, occasionaly Slovak) using lingua-py The dataset has in total (train+validation+test) 30,000 reviews. The data is balanced.

Train set has 8000 positive, 8000 neutral and 8000 negative reviews. Validation and test set each have 1000 positive, 1000 neutral and 1000 negative reviews.

Dataset Features

Each sample contains:

  • review_id : unique string identifier of the review.
  • rating_str : string representation of the rating - "pozitivní" / "neutrální" / "negativní"
  • rating_int : integer representation of the rating (1=positive, 0=neutral, -1=negative)
  • comment_language : language of the review (mostly "cs", occasionaly "sk")
  • comment : the string of the review

Dataset Source

The data is a processed adaptation of Mall CZ corpus . The adaptation is label-balanced and adds automatically-detected language