英文

Whisper Medium (Thai): 综合 V3

此模型是在经过增强的版本的mozilla-foundation/common_voice_13_0 th、google/fleurs和精选数据集上进行微调的版本。它在 common-voice-11 评估集上实现了以下结果(未更新):

  • Loss:0.1475
  • WER:13.03(不含分词器)
  • WER:8.44(使用 Deepcut 分词器)

模型描述

使用 huggingface 的 transformers 库调用该模型的方法如下:

from transformers import pipeline

MODEL_NAME = "biodatlab/whisper-medium-th-combined"  # specify the model name
lang = "th"  # change to Thai langauge

device = 0 if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    task="automatic-speech-recognition",
    model=MODEL_NAME,
    chunk_length_s=30,
    device=device,
)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(
  language=lang,
  task="transcribe"
)
text = pipe("audio.mp3")["text"] # give audio mp3 and transcribe text

用途和限制

需要更多信息

训练和评估数据

需要更多信息

训练过程

训练超参数

训练时使用了以下超参数:

  • learning_rate: 1e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam,betas=(0.9,0.999),epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 5000
  • mixed_precision_training: Native AMP

训练结果

Training Loss Epoch Step Validation Loss Wer
0.0679 2.09 5000 0.1475 13.03

框架版本

  • Transformers 4.31.0.dev0
  • Pytorch 2.1.0
  • Datasets 2.13.1
  • Tokenizers 0.13.3

引用

使用 BibTex 引用:

@misc {thonburian_whisper_med,
    author       = { Atirut Boribalburephan, Zaw Htet Aung, Knot Pipatsrisawat, Titipat Achakulvisut },
    title        = { Thonburian Whisper: A fine-tuned Whisper model for Thai automatic speech recognition },
    year         = 2022,
    url          = { https://huggingface.co/biodatlab/whisper-th-medium-combined },
    doi          = { 10.57967/hf/0226 },
    publisher    = { Hugging Face }
}