数据集:

psc

语言:

pl

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

other

批注创建人:

expert-generated

源数据集:

original
中文

Dataset Card for [Dataset Name]

Dataset Summary

The Polish Summaries Corpus contains news articles and their summaries. We used summaries of the same article as positive pairs and sampled the most similar summaries of different articles as negatives.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

Polish

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

  • extract_text: text to summarise
  • summary_text: summary of extracted text
  • label: 1 indicates summary is similar, 0 means that it is not similar

Data Splits

Data is splitted in train and test dataset. Test dataset doesn't have label column, so -1 is set instead.

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

CC BY-SA 3.0

Citation Information

@inproceedings{ogro:kop:14:lrec, title={The {P}olish {S}ummaries {C}orpus}, author={Ogrodniczuk, Maciej and Kope{'c}, Mateusz}, booktitle = "Proceedings of the Ninth International {C}onference on {L}anguage {R}esources and {E}valuation, {LREC}~2014", year = "2014", }

Contributions

Thanks to @abecadel for adding this dataset.