数据集:

voidful/EQG-RACE-PLUS

中文

Dataset Card for "QGG-RACE Dataset"

Table of Contents

  • Dataset Description
  • Dataset Summary
  • Supported Tasks and Leaderboards
  • Languages
  • Dataset Structure
  • Data Instances
  • Data Fields
  • Data Splits
  • Dataset Creation
  • Curation Rationale
  • Source Data
  • Annotations
  • Personal and Sensitive Information
  • Considerations for Using the Data
  • Social Impact of Dataset
  • Discussion of Biases
  • Other Known Limitations
  • Additional Information
  • Dataset Curators
  • Licensing Information
  • Citation Information
  • Contributions

Dataset Summary

QGG-RACE Dataset is a subset of RACE, containing three types of questions: Factoid, Cloze, and Summarization.

Dataset Download: GitHub Release

Data Statistics:

Types Examples Train Dev Test
Cloze Yingying is Wangwang's _ . 43167 2405 2462
Factiod What can Mimi do? 18405 1030 944
Summarization According to this passage we know that _ . 3004 175 184

Supported Tasks and Leaderboards

  • Question Generation
  • Reading Comprehension
  • Text Summarization

Languages

The dataset is in English.

Dataset Structure

Data Instances

An example data instance from the dataset is shown below:

{
    "answers": [
        "D",
        "A",
        "B",
        "C"
    ],
    "options": [
        [
            "States",
            "Doubts",
            "Confirms",
            "Removes"
        ],
        [
            "shows the kind of male birds females seek out.",
            "indicates the wandering albatross is the most faithful.",
            "is based on Professor Stutchbury's 20 years' research.",
            "suggests that female birds select males near their home."
        ],
        [
            "young birds' quality depends on their feather.",
            "some male birds care for others' young as their own.",
            "female birds go to find males as soon as autumn comes.",
            "female birds are responsible for feeding the hungry babies."
        ],
        [
            "A book about love-birds.",
            "Birds' living habits and love life",
            "The fact that birds don't love their mates forever.",
            "The factors that influence birds to look for another mate."
        ]
    ],
    "questions": [
        "What does the underline word \"dispels\" mean?",
        "The book The Private Lives of Birds _ .",
        "According to the passage, we can infer that _ .",
        "What is the passage mainly about?"
    ],
    "article": "Birds are not as loyal to their partners as you might think ...",
    "id": "high11327.txt",
    "factoid_questions": [
        "What does the underline word \"dispels\" mean?"
    ],
    "cloze_questions": [
        "The book The Private Lives of Birds _ ."
    ],
    "summarization_questions": [
        "According to the passage, we can infer that _ ."
    ]
}

Data Fields

  • id: Unique identifier for the example.
  • article: The main text passage.
  • questions: List of questions related to the passage.
  • options: List of answer options for each question.
  • answers: Indexes of the correct answers for each question.
  • factoid_questions: List of factoid questions.
  • cloze_questions: List of cloze questions.
  • summarization_questions: List of summarization questions.

Data Splits

  • Train: Contains 65,576 examples.
  • Dev: Contains 3,610 examples.
  • Test: Contains 3,590 examples.

Dataset Creation

Curation Rationale

QGG-RACE dataset is created as a subset of RACE, focusing on three types of questions: Factoid, Cloze, and Summarization. This dataset is intended to facilitate research in question generation and reading comprehension.

Source Data

Initial Data Collection and Normalization

QGG-RACE dataset is derived from RACE dataset.

Who are the source language producers?

The source language producers are the authors of the RACE dataset.

Annotations

Annotation process

The dataset is annotated with questions and their corresponding answer options.

Who are the annotators?

The annotators are the authors of the RACE dataset.

Personal and Sensitive Information

The dataset does not contain any personal or sensitive information.

Considerations for Using the Data

Social Impact of Dataset

The QGG-RACE dataset can be used for research in question generation and reading comprehension, leading to improvements in these fields.

Discussion of Biases

The dataset may inherit some biases from the RACE dataset as it is a subset of it.

Other Known Limitations

No other known limitations.

Additional Information

Dataset Curators

The QGG-RACE dataset is curated by the authors of the QGG-RACE dataset GitHub repository.

Licensing Information

The dataset is released under the CC BY 4.0 License .

Citation Information

No citation information is available for the QGG-RACE dataset.

Contributions

Thanks to @p208p2002 for creating the QGG-RACE dataset.