Whisper Medium (Thai): 综合 V3

此模型是在经过增强的版本的mozilla-foundation/common_voice_13_0 th、google/fleurs和精选数据集上进行微调的版本。它在 common-voice-11 评估集上实现了以下结果（未更新）：

Loss：0.1475
WER：13.03（不含分词器）
WER：8.44（使用 Deepcut 分词器）

模型描述

使用 huggingface 的 transformers 库调用该模型的方法如下：

from transformers import pipeline

MODEL_NAME = "biodatlab/whisper-medium-th-combined"  # specify the model name
lang = "th"  # change to Thai langauge

device = 0 if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    task="automatic-speech-recognition",
    model=MODEL_NAME,
    chunk_length_s=30,
    device=device,
)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(
  language=lang,
  task="transcribe"
)
text = pipe("audio.mp3")["text"] # give audio mp3 and transcribe text

用途和限制

需要更多信息

训练和评估数据

需要更多信息

训练过程

训练超参数

训练时使用了以下超参数：

learning_rate: 1e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam，betas=(0.9,0.999)，epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 5000
mixed_precision_training: Native AMP

训练结果

Training Loss	Epoch	Step	Validation Loss	Wer
0.0679	2.09	5000	0.1475	13.03

框架版本

Transformers 4.31.0.dev0
Pytorch 2.1.0
Datasets 2.13.1
Tokenizers 0.13.3

引用

使用 BibTex 引用：

@misc {thonburian_whisper_med,
    author       = { Atirut Boribalburephan, Zaw Htet Aung, Knot Pipatsrisawat, Titipat Achakulvisut },
    title        = { Thonburian Whisper: A fine-tuned Whisper model for Thai automatic speech recognition },
    year         = 2022,
    url          = { https://huggingface.co/biodatlab/whisper-th-medium-combined },
    doi          = { 10.57967/hf/0226 },
    publisher    = { Hugging Face }
}

作者:

Biomedical and Data Lab, Mahidol University

数据集大小:

1.42 GB