数据集:

IlyaGusev/stihi_ru

语言:

ru

大小:

1M<n<10M
中文

Stihi.ru dataset

Description

Summary: A subset if Taiga , uploaded here for convenience. Additional cleaning was performed.

Script: create_stihi.py

Point of Contact: Ilya Gusev

Languages: Russian.

Usage

Prerequisites:

pip install datasets zstandard jsonlines pysimdjson

Dataset iteration:

from datasets import load_dataset
dataset = load_dataset('IlyaGusev/stihi_ru', split="train", streaming=True)
for example in dataset:
    print(example["text"])

Personal and Sensitive Information

The dataset is not anonymized, so individuals' names can be found in the dataset. Information about the original authors is included in the dataset where possible.