模型:

mesolitica/finetune-translation-t5-super-super-tiny-standard-bahasa-cased

中文

finetune-translation-t5-super-super-standard-bahasa-cased

Finetuned T5 super super tiny on EN-MS and MS-EN translation tasks.

Dataset

  • EN-MS dataset, https://huggingface.co/datasets/mesolitica/en-ms
  • MS-EN dataset, https://huggingface.co/datasets/mesolitica/ms-en
  • NLLB eng_Latn-zsm_Latn, https://github.com/huseinzol05/malay-dataset/tree/master/translation/laser
  • Finetune details

  • Finetune using single RTX 3090 Ti.
  • Scripts at https://github.com/huseinzol05/malaya/tree/master/session/translation/hf-t5

    Supported prefix

  • terjemah Inggeris ke Melayu: {string} , for EN-MS translation.
  • terjemah Melayu ke Inggeris: {string} , for MS-EN translation.
  • Evaluation

    eng_Latn-zsm_Latn,

    {'name': 'BLEU',
     'score': 36.29074311583665,
     '_mean': -1.0,
     '_ci': -1.0,
     '_verbose': '71.2/46.0/30.9/21.0 (BP = 0.950 ratio = 0.951 hyp_len = 20958 ref_len = 22027)',
     'bp': 0.9502722319832295,
     'counts': [14919, 9178, 5858, 3780],
     'totals': [20958, 19961, 18964, 17967],
     'sys_len': 20958,
     'ref_len': 22027,
     'precisions': [71.18522759805325,
      45.97966033765844,
      30.890107572242144,
      21.038570712973787],
     'prec_str': '71.2/46.0/30.9/21.0',
     'ratio': 0.9514686521087756}
    chrF2++ = 61.89
    

    zsm_Latn-eng_Latn,

    {'name': 'BLEU',
     'score': 30.216143755278946,
     '_mean': -1.0,
     '_ci': -1.0,
     '_verbose': '64.9/38.1/24.1/15.3 (BP = 0.978 ratio = 0.978 hyp_len = 23057 ref_len = 23570)',
     'bp': 0.9779964796601237,
     'counts': [14963, 8410, 5082, 3063],
     'totals': [23057, 22060, 21063, 20066],
     'sys_len': 23057,
     'ref_len': 23570,
     'precisions': [64.89569328186668,
      38.12330009066183,
      24.127617148554336,
      15.264626731785109],
     'prec_str': '64.9/38.1/24.1/15.3',
     'ratio': 0.9782350445481545}
    chrF2++ = 56.46