模型:

speechbrain/asr-wav2vec2-commonvoice-fr

任务:

自动语音识别

类库:

speechbrain PyTorch

数据集:

commonvoice 3Acommonvoice

语言:

其他:

wav2vec2 CTC Transformer hf-asr-leaderboard Eval Results

许可:

apache-2.0

模型介绍文件清单

英文

用CTC/Attention训练的wav2vec 2.0在CommonVoice法语上（无语言模型）

该存储库提供了在SpeechBrain中预训练的用于从CommonVoice（法语）上进行端到端自动语音识别的所有必要工具。为了获得更好的体验，我们鼓励您了解更多信息： SpeechBrain .

模型的性能如下：

Release	Test CER	Test WER	GPUs
24-08-21	3.19	9.96	2xV100 32GB

流水线描述

该ASR系统由两个不同但相互关联的模块组成：

Tokenizer（unigram）将单词转换为子词单元，并使用CommonVoice（FR）的训练转录（train.tsv）进行训练。
声学模型（wav2vec2.0 + CTC）。预训练的wav2vec 2.0模型（ LeBenchmark/wav2vec2-FR-7K-large ）与两个DNN层结合，并在CommonVoice FR上进行微调。最终获得的声学表示被提供给CTC贪婪解码器。

该系统使用16kHz采样的录音进行训练（单声道）。在调用transcribe_file时，如果需要，代码将自动对音频进行标准化（即重新采样+选择单声道）。

安装SpeechBrain

首先，请使用以下命令安装transformers和SpeechBrain：

pip install speechbrain transformers

请注意，我们鼓励您阅读我们的教程，并了解更多信息： SpeechBrain 。

转录您自己的音频文件（法语）

from speechbrain.pretrained import EncoderASR

asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-commonvoice-fr", savedir="pretrained_models/asr-wav2vec2-commonvoice-fr")
asr_model.transcribe_file('speechbrain/asr-wav2vec2-commonvoice-fr/example-fr.wav')

在GPU上进行推理

要在GPU上进行推理，请在调用from_hparams方法时添加run_opts={"device":"cuda"}。

训练

该模型是使用SpeechBrain进行训练的。要从头开始训练，请按照以下步骤进行：

克隆SpeechBrain：

git clone https://github.com/speechbrain/speechbrain/

安装：

cd speechbrain
pip install -r requirements.txt
pip install -e .

运行训练：

cd recipes/CommonVoice/ASR/CTC/
python train_with_wav2vec.py hparams/train_fr_with_wav2vec.yaml --data_folder=your_data_folder

您可以在此处找到我们的训练结果（模型、日志等）： here 。

限制

SpeechBrain团队不对在其他数据集上使用此模型时所达到的性能提供任何保证。

关于SpeechBrain的引用

@misc{SB2021,
    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
    title = {SpeechBrain},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
  }

SpeechBrain是一个开源的全能语音工具包。它旨在简单、非常灵活和用户友好。在各个领域中都可以获得竞争力或最新技术水平的性能。

网站： https://speechbrain.github.io/

GitHub： https://github.com/speechbrain/speechbrain

作者:

SpeechBrain

数据集大小:

1.19 GB