Whisper Hindi Small

该模型是从多个公开可用的ASR语料库中提供的印地语数据上进行微调的版本。它作为Whisper微调阶段的一部分进行了微调。

注意：用于训练该模型的代码可在 whisper-finetune 存储库中找到，并可用于重新使用。

使用方法

要在整个数据集上评估此模型，可以使用 whisper-finetune 存储库中提供的评估代码。

该存储库还提供了使用whisper-jax进行更快推断的脚本。

要使用此模型推断单个音频文件，可以使用以下代码段：

>>> import torch
>>> from transformers import pipeline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"
>>> device = "cuda:0" if torch.cuda.is_available() else "cpu"

>>> transcribe = pipeline(task="automatic-speech-recognition", model="vasista22/whisper-hindi-small", chunk_length_s=30, device=device)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="hi", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

要更快地推断whisper模型，可以使用 whisper-jax 库。请按照 here 中提到的必要安装步骤进行安装，然后使用以下代码段：

>>> import jax.numpy as jnp
>>> from whisper_jax import FlaxWhisperForConditionalGeneration, FlaxWhisperPipline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"

>>> transcribe = FlaxWhisperPipline("vasista22/whisper-hindi-small", batch_size=16)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="hi", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

训练和评估数据

训练数据：

评估数据：

训练超参数

在训练过程中使用了以下超参数：

learning_rate: 1.75e-05
train_batch_size: 48
eval_batch_size: 32
seed: 22
optimizer: adamw_bnb_8bit
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 20000
training_steps: 19377（最初设置为129180步）
mixed_precision_training: True

鸣谢

这项工作是在 Speech Lab, IIT Madras 完成的。

此工作的计算资源由印度电子和信息技术部（MeitY）的“Bhashini：国家语言翻译任务”项目资助。

作者:

Vasista Lodagala

数据集大小:

1.8 GB