数据集:

mwz/ur_para

中文

Paraphrase Dataset (Urdu)

This dataset contains paraphrases in Urdu. It is provided in the Parquet format and is split into a training set with 393,000 rows.

Dataset Details

  • Columns:
    • sentence1 : The first sentence in a pair of paraphrases (string).
    • sentence2 : The second sentence in a pair of paraphrases (string).

Usage

You can use this dataset for various natural language processing tasks such as text similarity, paraphrase identification, and language generation.