ESPnet VITS Text-to-Speech (TTS)模型用于ONNX

espnet/kan-bayashi_ljspeech_vits 导出到ONNX。该模型是使用 espnet_onnx 库的ONNX导出。

与txtai一起使用

txtai 具有内置的文本到语音(TTS)流程，可以轻松使用此模型。

import soundfile as sf

from txtai.pipeline import TextToSpeech

# Build pipeline
tts = TextToSpeech("NeuML/ljspeech-vits-onnx")

# Generate speech
speech = tts("Say something here")

# Write to file
sf.write("out.wav", speech, 22050)

与ONNX一起使用

该模型也可以直接与ONNX一起运行，前提是输入文本已进行标记化。可以使用 ttstokenizer 进行标记化。

请注意，txtai流程具有额外的功能，例如批处理大量输入，这需要使用此方法进行复制。

import onnxruntime
import soundfile as sf
import yaml

from ttstokenizer import TTSTokenizer

# This example assumes the files have been downloaded locally
with open("ljspeech-vits-onnx/config.yaml", "r", encoding="utf-8") as f:
    config = yaml.safe_load(f)

# Create model
model = onnxruntime.InferenceSession(
    "ljspeech-vits-onnx/model.onnx",
    providers=["CPUExecutionProvider"]
)

# Create tokenizer
tokenizer = TTSTokenizer(config["token"]["list"])

# Tokenize inputs
inputs = tokenizer("Say something here")

# Generate speech
outputs = model.run(None, {"text": inputs})

# Write to file
sf.write("out.wav", outputs[0], 22050)

如何导出

有关如何将ESPnet模型导出为ONNX的更多信息，请查看 found here 。

作者:

NeuML

数据集大小:

131.55 MB