模型:

mesolitica/t5-base-standard-bahasa-cased

中文

t5-base-standard-bahasa-cased

Pretrained T5 base standard language model for Malay.

Pretraining Corpus

t5-base-standard-bahasa-cased model was pretrained on multiple tasks. Below is list of tasks we trained on,

  • Language masking task on bahasa news, bahasa Wikipedia, bahasa Academia.edu, bahasa parliament and translated The Pile.
  • News title prediction on bahasa news.
  • Next sentence prediction on bahasa news, bahasa Wikipedia, bahasa Academia.edu, bahasa parliament and translated The Pile.
  • Translated QA Natural.
  • Text Similarity task on translated SNLI and translated MNLI.
  • EN-MS translation.
  • MS-EN translation.
  • Abstractive Summarization.
  • Knowledge Graph triples generation.
  • Paraphrase.
  • Preparing steps can reproduce at https://github.com/huseinzol05/malaya/tree/master/pretrained-model/t5/prepare

    Pretraining details

    Supported prefix

  • soalan: {string} , trained using Natural QA.
  • ringkasan: {string} , for abstractive summarization.
  • tajuk: {string} , for abstractive title.
  • parafrasa: {string} , for abstractive paraphrase.
  • terjemah Inggeris ke Melayu: {string} , for EN-MS translation.
  • terjemah Melayu ke Inggeris: {string} , for MS-EN translation.
  • grafik pengetahuan: {string} , for MS text to EN Knowledge Graph triples format.
  • ayat1: {string1} ayat2: {string2} , semantic similarity.