数据集:

cdsc

语言创建人:

other

批注创建人:

expert-generated

源数据集:

original

语言:

pl

计算机处理:

monolingual

大小:

10K<n<100K
中文

Dataset Card for [Dataset Name]

Dataset Summary

Polish CDSCorpus consists of 10K Polish sentence pairs which are human-annotated for semantic relatedness and entailment. The dataset may be used for the evaluation of compositional distributional semantics models of Polish. The dataset was presented at ACL 2017. Please refer to the Wróblewska and Krasnowska-Kieraś (2017) for a detailed description of the resource.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

Polish

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

  • pair_ID: id of sentences pairs
  • sentence_A: first sentence
  • sentence_B: second sentence

for cdsc-e domain:

  • entailment_judgment: either 'NEUTRAL', 'CONTRADICTION' or 'ENTAILMENT'

for cdsc-r domain:

  • relatedness_score: float representing a reletedness

Data Splits

Data is splitted in train/dev/test split.

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

Dataset provided for research purposes only. Please check dataset license for additional information.

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

CC BY-NC-SA 4.0

Citation Information

[More Information Needed]

Contributions

Thanks to @abecadel for adding this dataset.