数据集:

ghomasHudson/vlsp

语言:

en
中文

Dataset Card for vlsp

Dataset Summary

Dataset following the methodology of the scientific_papers dataset, but specifically designed for very long documents (>10,000 words). This is gathered from arxiv.org by searching for theses.

The dataset has 2 features:

  • article: the body of the document.
  • abstract: the abstract of the document.

Supported Tasks and Leaderboards

Summarization

Languages

English

Dataset Structure

Data Instances

[Needs More Information]

Data Fields

[Needs More Information]

Data Splits

Only a test set is provided.

Dataset Creation

Curation Rationale

[Needs More Information]

Source Data

Initial Data Collection and Normalization

[Needs More Information]

Who are the source language producers?

[Needs More Information]

Annotations

Annotation process

[Needs More Information]

Who are the annotators?

[Needs More Information]

Personal and Sensitive Information

[Needs More Information]

Considerations for Using the Data

Social Impact of Dataset

[Needs More Information]

Discussion of Biases

[Needs More Information]

Other Known Limitations

[Needs More Information]

Additional Information

Dataset Curators

[Needs More Information]

Licensing Information

[Needs More Information]

Citation Information

[Needs More Information]