模型:

mesolitica/finetune-translation-t5-tiny-standard-bahasa-cased

中文

finetune-translation-t5-tiny-standard-bahasa-cased

Finetuned T5 tiny on EN-MS and MS-EN translation tasks.

Dataset

  • EN-MS dataset, https://huggingface.co/datasets/mesolitica/en-ms
  • MS-EN dataset, https://huggingface.co/datasets/mesolitica/ms-en
  • NLLB eng_Latn-zsm_Latn, https://github.com/huseinzol05/malay-dataset/tree/master/translation/laser
  • Finetune details

  • Finetune using single RTX 3090 Ti.
  • Scripts at https://github.com/huseinzol05/malaya/tree/master/session/translation/hf-t5

    Supported prefix

  • terjemah Inggeris ke Melayu: {string} , for EN-MS translation.
  • terjemah Melayu ke Inggeris: {string} , for MS-EN translation.
  • Evaluation

    eng_Latn-zsm_Latn,

    {'name': 'BLEU',
     'score': 41.625536185056305,
     '_mean': -1.0,
     '_ci': -1.0,
     '_verbose': '73.4/50.1/35.7/25.7 (BP = 0.971 ratio = 0.972 hyp_len = 21400 ref_len = 22027)',
     'bp': 0.9711259908305946,
     'counts': [15718, 10223, 6926, 4731],
     'totals': [21400, 20403, 19406, 18409],
     'sys_len': 21400,
     'ref_len': 22027,
     'precisions': [73.44859813084112,
      50.10537666029506,
      35.68999278573637,
      25.699386169808246],
     'prec_str': '73.4/50.1/35.7/25.7',
     'ratio': 0.9715349343986925}
    chrF2++ = 65.70
    

    zsm_Latn-eng_Latn,

    {'name': 'BLEU',
     'score': 37.26048464066508,
     '_mean': -1.0,
     '_ci': -1.0,
     '_verbose': '68.3/44.1/30.5/21.4 (BP = 0.995 ratio = 0.995 hyp_len = 23457 ref_len = 23570)',
     'bp': 0.9951942593830536,
     'counts': [16020, 9908, 6547, 4376],
     'totals': [23457, 22460, 21463, 20466],
     'sys_len': 23457,
     'ref_len': 23570,
     'precisions': [68.29517841156158,
      44.1139804096171,
      30.503657457019056,
      21.381803967555946],
     'prec_str': '68.3/44.1/30.5/21.4',
     'ratio': 0.9952057700466695}
    chrF2++ = 61.29