数据集:

imvladikon/hebrew_speech_coursera

任务:

自动语音识别

语言:

大小:

1K<n<10K

数据集介绍文件清单

中文

Dataset Card for Dataset Name

Dataset Summary

This dataset card aims to be a base template for new datasets. It has been generated using this raw template .

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure

Data Instances

{'audio': {'path': '/root/.cache/huggingface/datasets/downloads/extracted/89efd3a0fa3ead3f0b8e432e8796697a738d4561b24ff91f4fb2cc25d86e9fb0/train/ccef55189b7843d49110228cb0a71bfa115.wav',
  'array': array([-0.01217651, -0.04351807, -0.06278992, ..., -0.00018311,
         -0.00146484, -0.00349426]),
  'sampling_rate': 16000},
 'sentence': 'מצד אחד ובתנועה הציונית הצעירה'}

Data Fields

[More Information Needed]

Data Splits

train	validation
number of samples	20306	5076
hours	28.88	7.23

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

@misc{imvladikon2022hebrew_speech_coursera,
  author = {Gurevich, Vladimir},
  title = {Hebrew Speech Recognition Dataset: Coursera},
  year = {2022},
  howpublished = \url{https://huggingface.co/datasets/imvladikon/hebrew_speech_coursera},
}

Contributions

[More Information Needed]

作者:

imvladikon

数据集大小:

12.43 GB