模型:

speechbrain/soundchoice-g2p

英文

SoundChoice: 基于语义消歧的字素到音素模型

该代码库提供了使用SpeechBrain使用预训练SoundChoice G2P模型执行英语字素到音素转换的所有必要工具。该模型是使用从 LibriSpeech Alignments Google Wikipedia 获得的LibriG2P训练数据进行训练的。

安装SpeechBrain

首先,请使用以下命令安装SpeechBrain(本地安装):

pip install speechbrain
pip install transformers

请注意,我们鼓励您阅读我们的教程并了解更多关于 SpeechBrain 的信息。

执行G2P转换

请按照以下示例使用高级封装来执行字素到音素转换。

from speechbrain.pretrained import GraphemeToPhoneme
g2p = GraphemeToPhoneme.from_hparams("speechbrain/soundchoice-g2p")
text = "To be or not to be, that is the question"
phonemes = g2p(text)

以下是预期输出结果

>>> phonemes
['T', 'UW', ' ', 'B', 'IY', ' ', 'AO', 'R', ' ', 'N', 'AA', 'T', ' ', 'T', 'UW', ' ', 'B', 'IY', ' ', 'DH', 'AE', 'T', ' ', 'IH', 'Z', ' ', 'DH', 'AH', ' ', 'K', 'W', 'EH', 'S', 'CH', 'AH', 'N']

要对一批文本进行G2P转换,请将字符串数组传递给接口:

items = [
    "All's Well That Ends Well",
    "The Merchant of Venice",
    "The Two Gentlemen of Verona",
    "The Comedy of Errors"
]
transcriptions = g2p(items)

以下是预期输出结果:

>>> transcriptions
[['AO', 'L', 'Z', ' ', 'W', 'EH', 'L', ' ', 'DH', 'AE', 'T', ' ', 'EH', 'N', 'D', 'Z', ' ', 'W', 'EH', 'L'], ['DH', 'AH', ' ', 'M', 'ER', 'CH', 'AH', 'N', 'T', ' ', 'AH', 'V', ' ', 'V', 'EH', 'N', 'AH', 'S'], ['DH', 'AH', ' ', 'T', 'UW', ' ', 'JH', 'EH', 'N', 'T', 'AH', 'L', 'M', 'IH', 'N', ' ', 'AH', 'V', ' ', 'V', 'ER', 'OW', 'N', 'AH'], ['DH', 'AH', ' ', 'K', 'AA', 'M', 'AH', 'D', 'IY', ' ', 'AH', 'V', ' ', 'EH', 'R', 'ER', 'Z']]

在GPU上推理

要在GPU上执行推理,调用from_hparams方法时添加run_opts={"device":"cuda"}。

限制

SpeechBrain团队不对在其他数据集上使用此模型时实现的性能提供任何保证。

训练

该模型是使用SpeechBrain (aa018540)进行训练的。要从头开始训练,请按照以下步骤进行:

  • 克隆SpeechBrain:
  • git clone https://github.com/speechbrain/speechbrain/
    
  • 安装:
  • cd speechbrain
    pip install -r requirements.txt
    pip install -e .
    
  • 运行训练:
  • cd recipes/LibriSpeech/G2P
    python train.py hparams/hparams_g2p_rnn.yaml --data_folder=your_data_folder
    

    通过传递额外的参数来根据需要调整超参数。

    引用SpeechBrain

    如果您将SpeechBrain用于您的研究或商业用途,请引用SpeechBrain。

    @misc{speechbrain,
      title={{SpeechBrain}: A General-Purpose Speech Toolkit},
      author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
      year={2021},
      eprint={2106.04624},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      note={arXiv:2106.04624}
    }
    

    同时,请引用SoundChoice G2P论文,该预训练模型是基于该论文开发的:

    @misc{ploujnikov2022soundchoice,
          title={SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation}, 
          author={Artem Ploujnikov and Mirco Ravanelli},
          year={2022},
          eprint={2207.13703},
          archivePrefix={arXiv},
          primaryClass={cs.SD}
    }
    

    关于SpeechBrain