Vocoder与HiFIGAN在LJSpeech上的训练

该存储库提供了使用 HiFIGAN 训练的vocoder所需的所有工具。

预训练模型接受频谱图作为输入，并产生波形作为输出。通常，在将输入文本转换为频谱图的TTS模型之后使用语音合成器。

采样频率为22050 Hz。

安装SpeechBrain

pip install speechbrain

请注意，我们鼓励您阅读我们的教程，了解 SpeechBrain 的更多信息。

使用Vocoder

import torch
from speechbrain.pretrained import HIFIGAN
hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir")
mel_specs = torch.rand(2, 80,298)
waveforms = hifi_gan.decode_batch(mel_specs)

使用Vocoder进行TTS

import torchaudio
from speechbrain.pretrained import Tacotron2
from speechbrain.pretrained import HIFIGAN

# Intialize TTS (tacotron2) and Vocoder (HiFIGAN)
tacotron2 = Tacotron2.from_hparams(source="speechbrain/tts-tacotron2-ljspeech", savedir="tmpdir_tts")
hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")

# Running the TTS
mel_output, mel_length, alignment = tacotron2.encode_text("Mary had a little lamb")

# Running Vocoder (spectrogram-to-waveform)
waveforms = hifi_gan.decode_batch(mel_output)

# Save the waverform
torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)

在GPU上进行推理

要在GPU上执行推理，请在调用from_hparams方法时添加run_opts={"device":"cuda"}。

训练

该模型是使用SpeechBrain训练的。要从头开始训练，请按照以下步骤进行：

克隆SpeechBrain：

git clone https://github.com/speechbrain/speechbrain/

安装：

cd speechbrain
pip install -r requirements.txt
pip install -e .

运行训练：

cd recipes/LJSpeech/TTS/vocoder/hifi_gan/
python train.py hparams/train.yaml --data_folder /path/to/LJspeech

您可以在 here 中找到我们的训练结果（模型、日志等）。

作者:

SpeechBrain

数据集大小:

53.25 MB