数据集:
gigant/romanian_speech_synthesis_0_8_1
罗马尼亚语音合成(RSS)语料库是在爱丁堡大学的一个半消声室(消声墙和天花板;地板部分消声)中记录的。我们使用了三个高质量的专业麦克风:Neumann U89i(大膜电容麦克风)、Sennheiser MKH 800(小膜电容麦克风,频宽非常宽)和DPA 4035(头戴式电容麦克风)。尽管目前的发布版本仅包括通过Sennheiser MKH 800录制的语音数据,但我们将来可能会发布通过其他麦克风录制的语音数据。所有录音都是以96 kHz的采样频率和24位的采样精度进行的,然后降采样到48 kHz的采样频率。我们使用了ProTools HD硬件和软件进行录音、降采样和比特率转换。我们在一个月的时间里进行了8个会话,每个会话录制了约500个句子。在每个会话开始时,演讲者会听一段之前的录音样本,以达到相似的声音质量和语调。
罗马尼亚语
典型的数据点包括音频文件的路径,即音频和句子。
音频:包含下载的音频文件路径、解码后的音频数组和采样率的字典。注意,访问音频列时: dataset[0]["audio"] 会自动解码和重新采样为 dataset.features["audio"].sampling_rate 。解码和重新采样大量音频文件可能需要相当长的时间。因此,在"audio"列之前先查询样本索引非常重要,即 dataset[0]["audio"] 应始终优先于 dataset["audio"][0] 。
句子:用户被提示要说的句子
语音材料已被分为训练集和测试集。训练集包含3180个音频片段和相关的句子。测试集包含536个音频片段和相关的句子。
@article{Stan2011442, author = {Adriana Stan and Junichi Yamagishi and Simon King and Matthew Aylett}, title = {The {R}omanian speech synthesis ({RSS}) corpus: Building a high quality {HMM}-based speech synthesis system using a high sampling rate}, journal = {Speech Communication}, volume = {53}, number = {3}, pages = {442--450}, note = {}, abstract = {This paper first introduces a newly-recorded high quality Romanian speech corpus designed for speech synthesis, called ''RSS'', along with Romanian front-end text processing modules and HMM-based synthetic voices built from the corpus. All of these are now freely available for academic use in order to promote Romanian speech technology research. The RSS corpus comprises 3500 training sentences and 500 test sentences uttered by a female speaker and was recorded using multiple microphones at 96 kHz sampling frequency in a hemianechoic chamber. The details of the new Romanian text processor we have developed are also given. Using the database, we then revisit some basic configuration choices of speech synthesis, such as waveform sampling frequency and auditory frequency warping scale, with the aim of improving speaker similarity, which is an acknowledged weakness of current HMM-based speech synthesisers. As we demonstrate using perceptual tests, these configuration choices can make substantial differences to the quality of the synthetic speech. Contrary to common practice in automatic speech recognition, higher waveform sampling frequencies can offer enhanced feature extraction and improved speaker similarity for HMM-based speech synthesis.}, doi = {10.1016/j.specom.2010.12.002}, issn = {0167-6393}, keywords = {Speech synthesis, HTS, Romanian, HMMs, Sampling frequency, Auditory scale}, url = {http://www.sciencedirect.com/science/article/pii/S0167639310002074}, year = 2011 }
@gigant 添加了此数据集。