数据集:

GEM/conversational_weather

语言:

en

计算机处理:

unknown

语言创建人:

unknown

批注创建人:

none

源数据集:

original
中文

Dataset Card for GEM/conversational_weather

Link to Main Data Card

You can find the main data card on the GEM Website .

Dataset Summary

The purpose of this dataset is to assess how well a model can learn a template-like structure in a very low data setting. The task here is to produce a response to a weather-related query. The reply is further specified through the data attributes and discourse structure in the input. The output contains both the lexicalized text and discourse markers for attributes (e.g., _ARG_TEMP_ 34 ).

You can load the dataset via:

import datasets
data = datasets.load_dataset('GEM/conversational_weather')

The data loader can be found here .

paper

ACL Anthology

authors

Anusha Balakrishnan, Jinfeng Rao, Kartikeya Upasani, Michael White, Rajen Subba (Facebook Conversational AI)

Dataset Overview

Where to find the Data and its Documentation

Download

Github

Paper

ACL Anthology

BibTex
@inproceedings{balakrishnan-etal-2019-constrained,
  title = "Constrained Decoding for Neural {NLG} from Compositional Representations in Task-Oriented Dialogue",
  author = "Balakrishnan, Anusha  and
    Rao, Jinfeng  and
    Upasani, Kartikeya  and
    White, Michael  and
    Subba, Rajen",
  booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
  month = jul,
  year = "2019",
  address = "Florence, Italy",
  publisher = "Association for Computational Linguistics",
  url = "https://www.aclweb.org/anthology/P19-1080",
  doi = "10.18653/v1/P19-1080",
  pages = "831--844"
}
Contact Name

Kartikeya Upasani

Contact Email

kart@fb.com

Has a Leaderboard?

no

Languages and Intended Use

Multilingual?

no

Covered Languages

English

License

cc-by-nc-4.0: Creative Commons Attribution Non Commercial 4.0 International

Intended Use

This dataset is intended to help develop conversational agents that exhibit human-like properties such as matching the framing of the response with the query or contrasting relevant data attributes.

Primary Task

Data-to-Text

Communicative Goal

Producing a text that is a response to a weather query as per the discourse structure and data attributes specified in the input meaning representation.

Credit

Curation Organization Type(s)

industry

Curation Organization(s)

Facebook

Dataset Creators

Anusha Balakrishnan, Jinfeng Rao, Kartikeya Upasani, Michael White, Rajen Subba (Facebook Conversational AI)

Funding

Facebook

Who added the Dataset to GEM?

Vipul Raheja (Grammarly)

Dataset Structure

Data Fields
  • gem_id : (string): GEM-formatted row id
  • id : (string): Row id in the original data
  • user_query : (string): Natural language weather query from humans
  • tree_str_mr : (string): Synthetically-added user context (datetime and location) in the form of a tree-structured MR
  • response : (string): A tree-structured annotation of the response.
Example Instance
{'gem_id': 'weather-train-11',
'id': '1108963',
 'synthetic_user_context': '[__DG_INFORM__ [__ARG_TASK__ get_forecast ] '
                           '[__ARG_TEMP__ 37 ] [__ARG_TEMP_UNIT__ fahrenheit ] '
                           '[__ARG_CLOUD_COVERAGE__ partly cloudy ] '
                           '[__ARG_DATE_TIME__ [__ARG_COLLOQUIAL__ currently ] '
                           '] [__ARG_LOCATION__ [__ARG_CITY__ Oakland ] '
                           '[__ARG_COUNTRY__ United States ] [__ARG_REGION__ '
                           'California ] ] ] [__DG_INFORM__ [__ARG_TASK__ '
                           'get_forecast ] [__ARG_TEMP_SUMMARY__ mid 40s ] '
                           '[__ARG_DATE_TIME_RANGE__ [__ARG_COLLOQUIAL__ This '
                           'afternoon ] ] [__ARG_LOCATION__ [__ARG_CITY__ '
                           'Oakland ] [__ARG_COUNTRY__ United States ] '
                           '[__ARG_REGION__ California ] ] ] [__DG_INFORM__ '
                           '[__ARG_TASK__ get_forecast ] '
                           '[__ARG_CLOUD_COVERAGE__ mostly sunny ] '
                           '[__ARG_DATE_TIME_RANGE__ [__ARG_COLLOQUIAL__ This '
                           'afternoon ] ] [__ARG_LOCATION__ [__ARG_CITY__ '
                           'Oakland ] [__ARG_COUNTRY__ United States ] '
                           '[__ARG_REGION__ California ] ] ]',
 'tree_str_mr': "[__DG_INFORM__ It's [__ARG_DATE_TIME__ [__ARG_COLLOQUIAL__ "
                'currently ] ] [__ARG_CLOUD_COVERAGE__ partly cloudy ] and '
                '[__ARG_TEMP__ __ARG_TEMP__ ] [__ARG_TEMP_UNIT__ '
                '__ARG_TEMP_UNIT__ ] [__ARG_LOCATION__ in [__ARG_CITY__ '
                '__ARG_CITY__ ] , [__ARG_REGION__ __ARG_REGION__ ] , '
                '[__ARG_COUNTRY__ __ARG_COUNTRY__ ] ] . ] [__DG_INFORM__ '
                '[__ARG_DATE_TIME_RANGE__ [__ARG_COLLOQUIAL__ This afternoon ] '
                "] , it'll be [__ARG_CLOUD_COVERAGE__ mostly sunny ] ] "
                '[__DG_INFORM__ with temperatures in the [__ARG_TEMP_SUMMARY__ '
                'mid <number>  ] ]',
 'user_query': 'Show weather forecast for Oakland, CA. '}
Data Splits
  • Standard Splits: Train/Validation/Test
  • Additional Split: Disc_Test (a more challenging subset of the test set that contains discourse relations)
Splitting Criteria

The test set contains 3,121 examples, of which 1.1K (35%) have unique MRs that have never been seen in the training set.

{'gem_id': 'weather-train-13333', 'data_id': '1260610', 'user_query': 'Sundown', 'tree_str_mr': '[__DG_INFORM__ [__ARG_TASK__ get_weather_attribute ] [__ARG_SUNSET_TIME_DATE_TIME__ [__ARG_TIME__ 05:04 PM ] ] ]', 'response': '[__DG_INFORM__ The sun will go down at [__ARG_SUNSET_TIME_DATE_TIME__ [__ARG_TIME__ __ARG_TIME__ ] ] ]'}

Dataset in GEM

Rationale for Inclusion in GEM

Why is the Dataset in GEM?

The dataset was curated to develop a weather bot that exhibits human-like properties such as matching the framing of the response with the query or contrasting relevant data attributes.

The dataset offers rich tree-based meaning representations that offer fine-grained control over the response, e.g. by specifying which two attributes are to be contrasted. The natural language input queries are also provided to model the coherence of the response based on the input. The output response is annotated with the input meaning components using special bracketing tokens, which enables developing new techniques such as constrained decoding to improve quality of output responses

Similar Datasets

no

Ability that the Dataset measures

Adequately expressing CONTRAST and JUSTIFY discourse relations with appropriate grouping of arguments; adequately generalizing to many combinations of arguments.

GEM-Specific Curation

Modificatied for GEM?

yes

GEM Modifications

data points removed

Modification Details

The original repo contained a challenge set disc_test.tsv, which is a subset of the test set consisting of discourse relations (CONTRAST and JUSTIFY) , but also contained JOIN relations. This discrepancy has been rectified in the GEM version. The rectified version has been added in the challenge_sets

Additional Splits?

no

Getting Started with the Task

Previous Results

Previous Results

Measured Model Abilities

Adequately expressing CONTRAST and JUSTIFY discourse relations with appropriate grouping of arguments; adequately generalizing to many combinations of arguments.

Metrics

BLEU , Other: Other Metrics

Other Metrics

Tree accuracy: It measures whether the tree structure in the prediction matches that of the input MR exactly (modulo repeated arguments that need only appear once).

Proposed Evaluation

Automatic metrics are evaluated on the raw model predictions (which have de-lexicalized fields):

  • Tree accuracy: Measures whether the tree structure in the prediction matches that of the input MR exactly.
  • BLEU-4: A word overlap metric commonly used for evaluating NLG systems.

Authors also performed human evaluation studies by asking annotators to evaluate the quality of responses produced by different models. Annotators provided binary ratings on the following dimensions: • Grammaticality: Measures fluency of the responses. • Correctness: Measures semantic correctness of the responses.

Previous results available?

no

Dataset Curation

Original Curation

Original Curation Rationale

The dataset was curated to develop a weather bot that exhibits human-like properties such as matching the framing of the response with the query or contrasting relevant data attributes. To achieve this, the dataset contains rich tree-structured meaning representations that are specified using several data arguments and discourse acts, the input natural language queries, and annotations for the responses.

Communicative Goal

Producing a text that is a response to a weather query as per the discourse structure and data attributes specified in the input meaning representation.

Sourced from Different Sources

no

Language Data

How was Language Data Obtained?

Crowdsourced , Machine-generated

Where was it crowdsourced?

Other crowdworker platform

Topics Covered

The dataset is focused on the weather domain: Weather was the first successful case of NLG put into production back in the 80s (Reiter & Dale, 1997). This domain offers significant complexity for NLG. Weather forecast summaries in particular can be very long, and require reasoning over several disjoint pieces of information.

Data Validation

validated by crowdworker

Data Preprocessing

Please refer to Appendix D of the original paper for details.

Was Data Filtered?

hybrid

Filter Criteria

Please refer to Appendix C of the original paper for details.

Structured Annotations

Additional Annotations?

none

Annotation Service?

no

Consent

Any Consent Policy?

no

Justification for Using the Data

Annotation was done as work for hire and contains no PII.

Private Identifying Information (PII)

Contains PII?

no PII

Justification for no PII

Data is simulated and not specific to annotator.

Maintenance

Any Maintenance Plan?

no

Broader Social Context

Previous Work on the Social Impact of the Dataset

Usage of Models based on the Data

no

Impact on Under-Served Communities

Addresses needs of underserved Communities?

no

Discussion of Biases

Any Documented Social Biases?

unsure

Are the Language Producers Representative of the Language?

Grammatical evaluations performed with the data to date have used norms from informal Standard American English. These prescriptive notions of grammaticality potentially serve to perpetuate systemic power imbalances as they’re conveyed by language.

Since the data only contains informal Standard American English, its use to train a model may not be appropriate depending on the potential use case.

Considerations for Using the Data

PII Risks and Liability

Potential PII Risk

Annotation was done as work for hire and contains no PII. Annotated data is simulated and not specific to annotator.

Licenses

Known Technical Limitations

Unsuited Applications

An imperfect model used to convey actual weather data could mislead users about weather conditions?