数据集:

persiannlp/parsinlu_entailment

语言:

fa

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

expert-generated

批注创建人:

expert-generated

预印本库:

arxiv:2012.06154
中文

Dataset Card for PersiNLU (Textual Entailment)

Dataset Summary

A Persian textual entailment task (deciding sent1 entails sent2 ). The questions are partially translated from the SNLI dataset and partially generated by expert annotators.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

The text dataset is in Persian ( fa ).

Dataset Structure

Data Instances

Here is an example from the dataset:

{
  "sent1": "سالها است که کنگره در تلاش است تا اثربخشی مدیریت اطلاعات و فناوری را در دولت فدرال افزایش دهد.",
  "sent2": "کنگره بودجه ویژه ای برای مدیریت اطلاعات و فناوری در دولت فدرال دارد.",
  "label": "n",
  "category": "translation-train"
}

Data Fields

  • sent1 : the first sentence.
  • sent2 : the second sentence.
  • source : whether the questions are translated from MNLI ( translation-. ) or they're written by native speakers ( natural-. ).
  • label : e if sent2 is entailed from sent1 ; c if sent2 is contradictory to sent1 ; n if the two sentences are neutral.

Data Splits

The train/dev/test splits contains 756/271/1751 samples.

Dataset Creation

Curation Rationale

For details, check the corresponding draft .

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

CC BY-NC-SA 4.0 License

Citation Information

@article{huggingface:dataset,
    title = {ParsiNLU: A Suite of Language Understanding Challenges for Persian},
    authors = {Khashabi, Daniel and Cohan, Arman and Shakeri, Siamak and Hosseini, Pedram and Pezeshkpour, Pouya and Alikhani, Malihe and Aminnaseri, Moin and Bitaab, Marzieh and Brahman, Faeze and Ghazarian, Sarik and others},
    year={2020}
    journal = {arXiv e-prints},
    eprint = {2012.06154},    
}

Contributions

Thanks to @danyaljj for adding this dataset.