数据集:

onestop_english

任务:

文生文

文本分类

子任务:

multi-class-classification text-simplification

语言:

计算机处理:

monolingual

大小:

n<1K

语言创建人:

found

批注创建人:

found

源数据集:

original

许可:

cc-by-sa-4.0

数据集介绍文件清单

中文

Dataset Card for OneStopEnglish corpus

Dataset Summary

OneStopEnglish is a corpus of texts written at three reading levels, and demonstrates its usefulness for through two applications - automatic readability assessment and automatic text simplification.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure

Data Instances

An instance example:

{
  "text": "When you see the word Amazon, what’s the first thing you think...",
  "label": 0
}

Note that each instance contains the full text of the document.

Data Fields

text : Full document text.
label : Reading level of the document- ele/int/adv (Elementary/Intermediate/Advance).

Data Splits

The OneStopEnglish dataset has a single train split.

Split	Number of instances
train	567

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

Creative Commons Attribution-ShareAlike 4.0 International License

Citation Information

[More Information Needed]

Contributions

Thanks to @purvimisal for adding this dataset.

作者:

佚名

数据集大小:

12.52 KB