模型:
speechbrain/soundchoice-g2p
该代码库提供了使用SpeechBrain使用预训练SoundChoice G2P模型执行英语字素到音素转换的所有必要工具。该模型是使用从 LibriSpeech Alignments 和 Google Wikipedia 获得的LibriG2P训练数据进行训练的。
首先,请使用以下命令安装SpeechBrain(本地安装):
pip install speechbrain pip install transformers
请注意,我们鼓励您阅读我们的教程并了解更多关于 SpeechBrain 的信息。
请按照以下示例使用高级封装来执行字素到音素转换。
from speechbrain.pretrained import GraphemeToPhoneme g2p = GraphemeToPhoneme.from_hparams("speechbrain/soundchoice-g2p") text = "To be or not to be, that is the question" phonemes = g2p(text)
以下是预期输出结果
>>> phonemes ['T', 'UW', ' ', 'B', 'IY', ' ', 'AO', 'R', ' ', 'N', 'AA', 'T', ' ', 'T', 'UW', ' ', 'B', 'IY', ' ', 'DH', 'AE', 'T', ' ', 'IH', 'Z', ' ', 'DH', 'AH', ' ', 'K', 'W', 'EH', 'S', 'CH', 'AH', 'N']
要对一批文本进行G2P转换,请将字符串数组传递给接口:
items = [ "All's Well That Ends Well", "The Merchant of Venice", "The Two Gentlemen of Verona", "The Comedy of Errors" ] transcriptions = g2p(items)
以下是预期输出结果:
>>> transcriptions [['AO', 'L', 'Z', ' ', 'W', 'EH', 'L', ' ', 'DH', 'AE', 'T', ' ', 'EH', 'N', 'D', 'Z', ' ', 'W', 'EH', 'L'], ['DH', 'AH', ' ', 'M', 'ER', 'CH', 'AH', 'N', 'T', ' ', 'AH', 'V', ' ', 'V', 'EH', 'N', 'AH', 'S'], ['DH', 'AH', ' ', 'T', 'UW', ' ', 'JH', 'EH', 'N', 'T', 'AH', 'L', 'M', 'IH', 'N', ' ', 'AH', 'V', ' ', 'V', 'ER', 'OW', 'N', 'AH'], ['DH', 'AH', ' ', 'K', 'AA', 'M', 'AH', 'D', 'IY', ' ', 'AH', 'V', ' ', 'EH', 'R', 'ER', 'Z']]
要在GPU上执行推理,调用from_hparams方法时添加run_opts={"device":"cuda"}。
SpeechBrain团队不对在其他数据集上使用此模型时实现的性能提供任何保证。
该模型是使用SpeechBrain (aa018540)进行训练的。要从头开始训练,请按照以下步骤进行:
git clone https://github.com/speechbrain/speechbrain/
cd speechbrain pip install -r requirements.txt pip install -e .
cd recipes/LibriSpeech/G2P python train.py hparams/hparams_g2p_rnn.yaml --data_folder=your_data_folder
通过传递额外的参数来根据需要调整超参数。
如果您将SpeechBrain用于您的研究或商业用途,请引用SpeechBrain。
@misc{speechbrain, title={{SpeechBrain}: A General-Purpose Speech Toolkit}, author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio}, year={2021}, eprint={2106.04624}, archivePrefix={arXiv}, primaryClass={eess.AS}, note={arXiv:2106.04624} }
同时,请引用SoundChoice G2P论文,该预训练模型是基于该论文开发的:
@misc{ploujnikov2022soundchoice, title={SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation}, author={Artem Ploujnikov and Mirco Ravanelli}, year={2022}, eprint={2207.13703}, archivePrefix={arXiv}, primaryClass={cs.SD} }