数据集:

LLukas22/cqadupstack

中文

Dataset Card for "cqadupstack"

Dataset Summary

This is a preprocessed version of cqadupstack, to make it easily consumable via huggingface. The original dataset can be found here .

CQADupStack is a benchmark dataset for community question-answering (cQA) research. It contains threads from twelve StackExchange1 subforums, annotated with duplicate question information and comes with pre-defined training, development, and test splits, both for retrieval and classification experiments.

Dataset Structure

Data Instances

An example of 'train' looks as follows.

{
    "question": "Very often, when some unknown company is calling me, in couple of seconds I see its name and logo on standard ...",
    "answer": "You didn't explicitely mention it, but from the context I assume you're using a device with Android 4.4 (Kitkat). With that ...",
    "title": "Why Dialer shows contact name and image, when contact is not in my address book?",
    "forum_tag": "android"
}

Data Fields

The data fields are the same among all splits.

  • question : a string feature.
  • answer : a string feature.
  • title : a string feature.
  • forum_tag : a categorical string feature.

Additional Information

Licensing Information

This dataset is distributed under the Apache 2.0 licence.