This dataset contains paraphrases in Urdu. It is provided in the Parquet format and is split into a training set with 393,000 rows.
You can use this dataset for various natural language processing tasks such as text similarity, paraphrase identification, and language generation.