英文

从语音录音中使用ECAPA嵌入进行语言识别在CommonLanguage上

该存储库提供了执行使用SpeechBrain从语音录音进行语言识别的所有必要工具。该系统使用在CommonLanguage数据集(45种语言)上预训练的模型。您可以下载数据集 here 提供的系统可以从短语音录音中识别以下45种语言:

Arabic, Basque, Breton, Catalan, Chinese_China, Chinese_Hongkong, Chinese_Taiwan, Chuvash, Czech, Dhivehi, Dutch, English, Esperanto, Estonian, French, Frisian, Georgian, German, Greek, Hakha_Chin, Indonesian, Interlingua, Italian, Japanese, Kabyle, Kinyarwanda, Kyrgyz, Latvian, Maltese, Mongolian, Persian, Polish, Portuguese, Romanian, Romansh_Sursilvan, Russian, Sakha, Slovenian, Spanish, Swedish, Tamil, Tatar, Turkish, Ukrainian, Welsh

为了获得更好的体验,我们鼓励您了解更多关于 SpeechBrain 的信息。给定模型在测试集上的性能为:

Release Accuracy (%)
30-06-21 85.0

流水线描述

该系统由ECAPA模型与统计汇聚相结合。在其上部应用了使用分类交叉熵损失训练的分类器。

该系统使用采样率为16kHz(单声道)的录音进行训练。在调用 classify_file 时,代码将自动规范化您的音频(即重新采样+选择单声道)。如果使用 encode_batch 和 classify_batch,请确保您的输入张量符合预期的采样率。

安装SpeechBrain

首先,请使用以下命令安装SpeechBrain:

pip install speechbrain

请注意,我们鼓励您阅读我们的教程并了解更多关于 SpeechBrain 的信息。

从语音录音中执行语言识别

import torchaudio
from speechbrain.pretrained import EncoderClassifier
classifier = EncoderClassifier.from_hparams(source="speechbrain/lang-id-commonlanguage_ecapa", savedir="pretrained_models/lang-id-commonlanguage_ecapa")
# Italian Example
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/lang-id-commonlanguage_ecapa/example-it.wav')
print(text_lab)

# French Example
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/lang-id-commonlanguage_ecapa/example-fr.wav')
print(text_lab)

在GPU上进行推断

要在GPU上执行推断,请在调用 from_hparams 方法时添加 run_opts = {"device":"cuda"} 。

训练

该模型是使用SpeechBrain(a02f860e)训练的。要从头开始训练,请按照以下步骤进行:

  • 克隆SpeechBrain:
  • git clone https://github.com/speechbrain/speechbrain/
    
  • 安装它:
  • cd speechbrain
    pip install -r requirements.txt
    pip install -e .
    
  • 运行训练:
  • cd recipes/CommonLanguage/lang_id
    python train.py hparams/train_ecapa_tdnn.yaml --data_folder=your_data_folder
    

    您可以在此处找到我们的训练结果(模型、日志等) here

    限制

    SpeechBrain团队不对在其他数据集上使用该模型时所达到的性能提供任何保证。

    引用ECAPA
      author    = {Brecht Desplanques and
                   Jenthe Thienpondt and
                   Kris Demuynck},
      editor    = {Helen Meng and
                   Bo Xu and
                   Thomas Fang Zheng},
      title     = {{ECAPA-TDNN:} Emphasized Channel Attention, Propagation and Aggregation
                   in {TDNN} Based Speaker Verification},
      booktitle = {Interspeech 2020},
      pages     = {3830--3834},
      publisher = {{ISCA}},
      year      = {2020},
    }
    

    引用SpeechBrain

    如果您将SpeechBrain用于您的研究或业务,请引用SpeechBrain。

    @misc{speechbrain,
      title={{SpeechBrain}: A General-Purpose Speech Toolkit},
      author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
      year={2021},
      eprint={2106.04624},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      note={arXiv:2106.04624}
    }