数据集:

DISCOX/DISCO-10K-random

许可:

cc-by-4.0
中文

Getting Started

You can download the dataset using HuggingFace:

from datasets import load_dataset
ds = load_dataset("DISCOX/DISCO-10K-random")

The dataset contains 10,000 random samples from the DISCO-10M dataset found here .

Dataset Structure

The dataset contains the following features:

{
 'video_url_youtube',
 'video_title_youtube',
 'track_name_spotify',
 'video_duration_youtube_sec',
 'preview_url_spotify',
 'video_view_count_youtube',
 'video_thumbnail_url_youtube',
 'search_query_youtube',
 'video_description_youtube',
 'track_id_spotify',
 'album_id_spotify',
 'artist_id_spotify',
 'track_duration_spotify_ms',
 'primary_artist_name_spotify',
 'track_release_date_spotify',
 'explicit_content_spotify',
 'similarity_duration',
 'similarity_query_video_title',
 'similarity_query_description',
 'similarity_audio',
 'audio_embedding_spotify',
 'audio_embedding_youtube',
}

More details about the dataset can be found here .