数据集:
PolyAI/minds14
任务:
自动语音识别子任务:
keyword-spotting计算机处理:
multilingual大小:
10K<n<100K预印本库:
arxiv:2104.08524许可:
cc-by-4.0MINDS-14是一个用于语音数据意图检测任务的训练和评估资源。它覆盖了来自电子银行领域商业系统中提取的14个意图,与14种不同语言风格的口语示例相关联。
MInDS-14可以按以下方式下载和使用:
from datasets import load_dataset minds_14 = load_dataset("PolyAI/minds14", "fr-FR") # for French # to download all data for multi-lingual fine-tuning uncomment following line # minds_14 = load_dataset("PolyAI/all", "all") # see structure print(minds_14) # load audio sample on the fly audio_input = minds_14["train"][0]["audio"] # first decoded audio sample intent_class = minds_14["train"][0]["intent_class"] # first transcription intent = minds_14["train"].features["intent_class"].names[intent_class] # use audio_input and language_class to fine-tune your model for audio classification
我们展示了配置fr-FR的数据集示例配置的详细信息。所有其他配置具有相同的结构。
fr-FR
配置fr-FR的数据实例示例如下:
{ "path": "/home/patrick/.cache/huggingface/datasets/downloads/extracted/3ebe2265b2f102203be5e64fa8e533e0c6742e72268772c8ac1834c5a1a921e3/fr-FR~ADDRESS/response_4.wav", "audio": { "path": "/home/patrick/.cache/huggingface/datasets/downloads/extracted/3ebe2265b2f102203be5e64fa8e533e0c6742e72268772c8ac1834c5a1a921e3/fr-FR~ADDRESS/response_4.wav", "array": array( [0.0, 0.0, 0.0, ..., 0.0, 0.00048828, -0.00024414], dtype=float32 ), "sampling_rate": 8000, }, "transcription": "je souhaite changer mon adresse", "english_transcription": "I want to change my address", "intent_class": 1, "lang_id": 6, }
数据字段在所有拆分中都相同。
每个配置只有一个“train”拆分,包含约600个示例。
所有数据集均在 Creative Commons license (CC-BY) 下获得许可。
@article{DBLP:journals/corr/abs-2104-08524, author = {Daniela Gerz and Pei{-}Hao Su and Razvan Kusztos and Avishek Mondal and Michal Lis and Eshan Singhal and Nikola Mrksic and Tsung{-}Hsien Wen and Ivan Vulic}, title = {Multilingual and Cross-Lingual Intent Detection from Spoken Data}, journal = {CoRR}, volume = {abs/2104.08524}, year = {2021}, url = {https://arxiv.org/abs/2104.08524}, eprinttype = {arXiv}, eprint = {2104.08524}, timestamp = {Mon, 26 Apr 2021 17:25:10 +0200}, biburl = {https://dblp.org/rec/journals/corr/abs-2104-08524.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
感谢 @patrickvonplaten 添加了这个数据集