数据集:
cdminix/libritts-aligned
There is also an identical dataset for the new libritts-r dataset at cdminix/libritts-r-aligned
This dataset downloads LibriTTS and preprocesses it on your machine to create alignments using montreal forced aligner . You need to run pip install alignments phones before using this dataset. When running this the first time, it can take an hour or two, but subsequent runs will be lightning fast.
{ 'id': '100_122655_000073_000002.wav', 'speaker': '100', 'text': 'the day after, diana and mary quitted it for distant b.', 'start': 0.0, 'end': 3.6500000953674316, 'phones': ['[SILENCE]', 'ð', 'ʌ', '[SILENCE]', 'd', 'eɪ', '[SILENCE]', 'æ', 'f', 't', 'ɜ˞', '[COMMA]', 'd', 'aɪ', 'æ', 'n', 'ʌ', '[SILENCE]', 'æ', 'n', 'd', '[SILENCE]', 'm', 'ɛ', 'ɹ', 'i', '[SILENCE]', 'k', 'w', 'ɪ', 't', 'ɪ', 'd', '[SILENCE]', 'ɪ', 't', '[SILENCE]', 'f', 'ɜ˞', '[SILENCE]', 'd', 'ɪ', 's', 't', 'ʌ', 'n', 't', '[SILENCE]', 'b', 'i', '[FULL STOP]'], 'phone_durations': [5, 2, 4, 0, 5, 13, 0, 16, 7, 5, 20, 2, 6, 9, 15, 4, 2, 0, 11, 3, 5, 0, 3, 8, 9, 8, 0, 13, 3, 5, 3, 6, 4, 0, 8, 5, 0, 9, 5, 0, 7, 5, 6, 7, 4, 5, 10, 0, 3, 35, 9], 'audio': '/dev/shm/metts/train-clean-360-alignments/100/100_122655_000073_000002.wav' }
The phones are IPA phones, and the phone durations are in frames (assuming a hop length of 256, sample rate of 22050 and window length of 1024). These attributes can be changed using the hop_length , sample_rate and window_length arguments to LibriTTSAlign .
This dataset comes with a data collator which can be used to create batches of data for training. It can be installed using pip install speech-collator ( MiniXC/speech-collator ) and can be used as follows:
import json from datasets import load_dataset from speech_collator import SpeechCollator from torch.utils.data import DataLoader dataset = load_dataset('cdminix/libritts-aligned', split="train") speaker2ixd = json.load(open("speaker2idx.json")) phone2ixd = json.load(open("phone2idx.json")) collator = SpeechCollator( speaker2ixd=speaker2idx, phone2ixd=phone2idx , ) dataloader = DataLoader(dataset, collate_fn=collator.collate_fn, batch_size=8)
You can either download the speaker2idx.json and phone2idx.json files from here or create them yourself using the following code:
import json from datasets import load_dataset from speech_collator import SpeechCollator, create_speaker2idx, create_phone2idx dataset = load_dataset("cdminix/libritts-aligned", split="train") # Create speaker2idx and phone2idx speaker2idx = create_speaker2idx(dataset, unk_idx=0) phone2idx = create_phone2idx(dataset, unk_idx=0) # save to json with open("speaker2idx.json", "w") as f: json.dump(speaker2idx, f) with open("phone2idx.json", "w") as f: json.dump(phone2idx, f)
When using speech-collator you can also use the measures argument to specify which measures to use. The following example extracts Pitch and Energy on the fly.
import json from torch.utils.data import DataLoader from datasets import load_dataset from speech_collator import SpeechCollator, create_speaker2idx, create_phone2idx from speech_collator.measures import PitchMeasure, EnergyMeasure dataset = load_dataset("cdminix/libritts-aligned", split="train") speaker2idx = json.load(open("data/speaker2idx.json")) phone2idx = json.load(open("data/phone2idx.json")) # Create SpeechCollator speech_collator = SpeechCollator( speaker2idx=speaker2idx, phone2idx=phone2idx, measures=[PitchMeasure(), EnergyMeasure()], return_keys=["measures"] ) # Create DataLoader dataloader = DataLoader( dataset, batch_size=8, collate_fn=speech_collator.collate_fn, )
COMING SOON: Detailed documentation on how to use the measures at MiniXC/speech-collator .
This dataset has the following splits:
There are a few environment variable which can be set.
When using LibriTTS please cite the following papers:
When using the Measures please cite the following paper (ours):