数据集:

voidful/EQG-RACE-PLUS

数据集介绍文件清单

中文

Dataset Card for "QGG-RACE Dataset"

Table of Contents

Dataset Description
Dataset Summary
Supported Tasks and Leaderboards
Languages
Dataset Structure
Data Instances
Data Fields
Data Splits
Dataset Creation
Curation Rationale
Source Data
Annotations
Personal and Sensitive Information
Considerations for Using the Data
Social Impact of Dataset
Discussion of Biases
Other Known Limitations
Additional Information
Dataset Curators
Licensing Information
Citation Information
Contributions

Dataset Summary

QGG-RACE Dataset is a subset of RACE, containing three types of questions: Factoid, Cloze, and Summarization.

Dataset Download: GitHub Release

Data Statistics:

Types	Examples	Train	Dev	Test
Cloze	Yingying is Wangwang's _ .	43167	2405	2462
Factiod	What can Mimi do?	18405	1030	944
Summarization	According to this passage we know that _ .	3004	175	184

Supported Tasks and Leaderboards

Question Generation
Reading Comprehension
Text Summarization

Languages

The dataset is in English.

Dataset Structure

Data Instances

An example data instance from the dataset is shown below:

{
    "answers": [
        "D",
        "A",
        "B",
        "C"
    ],
    "options": [
        [
            "States",
            "Doubts",
            "Confirms",
            "Removes"
        ],
        [
            "shows the kind of male birds females seek out.",
            "indicates the wandering albatross is the most faithful.",
            "is based on Professor Stutchbury's 20 years' research.",
            "suggests that female birds select males near their home."
        ],
        [
            "young birds' quality depends on their feather.",
            "some male birds care for others' young as their own.",
            "female birds go to find males as soon as autumn comes.",
            "female birds are responsible for feeding the hungry babies."
        ],
        [
            "A book about love-birds.",
            "Birds' living habits and love life",
            "The fact that birds don't love their mates forever.",
            "The factors that influence birds to look for another mate."
        ]
    ],
    "questions": [
        "What does the underline word \"dispels\" mean?",
        "The book The Private Lives of Birds _ .",
        "According to the passage, we can infer that _ .",
        "What is the passage mainly about?"
    ],
    "article": "Birds are not as loyal to their partners as you might think ...",
    "id": "high11327.txt",
    "factoid_questions": [
        "What does the underline word \"dispels\" mean?"
    ],
    "cloze_questions": [
        "The book The Private Lives of Birds _ ."
    ],
    "summarization_questions": [
        "According to the passage, we can infer that _ ."
    ]
}

Data Fields

id: Unique identifier for the example.
article: The main text passage.
questions: List of questions related to the passage.
options: List of answer options for each question.
answers: Indexes of the correct answers for each question.
factoid_questions: List of factoid questions.
cloze_questions: List of cloze questions.
summarization_questions: List of summarization questions.

Data Splits

Train: Contains 65,576 examples.
Dev: Contains 3,610 examples.
Test: Contains 3,590 examples.

Dataset Creation

Curation Rationale

QGG-RACE dataset is created as a subset of RACE, focusing on three types of questions: Factoid, Cloze, and Summarization. This dataset is intended to facilitate research in question generation and reading comprehension.

Source Data

Initial Data Collection and Normalization

QGG-RACE dataset is derived from RACE dataset.

Who are the source language producers?

The source language producers are the authors of the RACE dataset.

Annotations

Annotation process

The dataset is annotated with questions and their corresponding answer options.

Who are the annotators?

The annotators are the authors of the RACE dataset.

Personal and Sensitive Information

The dataset does not contain any personal or sensitive information.

Considerations for Using the Data

Social Impact of Dataset

The QGG-RACE dataset can be used for research in question generation and reading comprehension, leading to improvements in these fields.

Discussion of Biases

The dataset may inherit some biases from the RACE dataset as it is a subset of it.

Other Known Limitations

No other known limitations.

Additional Information

Dataset Curators

The QGG-RACE dataset is curated by the authors of the QGG-RACE dataset GitHub repository.

Licensing Information

The dataset is released under the CC BY 4.0 License .

Citation Information

No citation information is available for the QGG-RACE dataset.

Contributions

Thanks to @p208p2002 for creating the QGG-RACE dataset.

作者:

voidful

数据集大小:

78.13 MB