数据集:

tartuNLP/liv4ever

中文

liv4ever v1

This is the Livonian 4-lingual parallel corpus. Livonian is a Uralic / Finnic language with just about 20 fluent speakers and no native speakers (as of 2021). The texts and translations in this corpus were collected from all the digital text resources that could be found by the authors; scanned and printed materials are left for future work.

The corpus includes parallel data for Livonian-Latvian, Livonian-Estonian and Livonian-English; the data has been collected in 2021. After retrieval it was normalized in terms of different orthographies of Livonian and manually sentence-aligned where needed. It was collected from the following sources, with sentence counts per language pair:

  • Dictionary - example sentences from the Livonian-Latvian-Estonian dictionary;
    • liv-lv: 10'388,
    • liv-et: 10'378
  • Stalte - the alphabet book by Kōrli Stalte, translated into Estonian and Latvian;
    • liv-lv: 842,
    • liv-et: 685
  • Poetry - the poetry collection book "Ma võtan su õnge, tursk / Ma akūb sīnda vizzõ, tūrska", with Estonian translations;
    • liv-et: 770
  • Vääri - the book by Eduard Vääri about Livonian language and culture;
    • liv-et: 592
  • Satversme - translations of the Latvian Constitution into Livonian, Estonian and English;
    • liv-en: 380,
    • liv-lv: 414,
    • liv-et: 413
  • Facebook - social media posts by the Livonian Institute and Livonian Days with original translations;
    • liv-en: 123,
    • liv-lv: 124,
    • liv-et: 7
  • JEFUL - article abstracts from the Journal of Estonian and Finno-Ugric Linguistics, special issues dedicated to Livonian studies, translated into Estonian and English;
    • liv-en: 36,
    • liv-et: 49
  • Trilium - the book with a collection of Livonian poetry, foreword and afterword translated into Estonian and Latvian;
    • liv-lv: 51,
    • liv-et: 53
  • Songs - material crawled off lyricstranslate.com;
    • liv-en: 54,
    • liv-lv: 54,
    • liv-fr: 31