数据集:

Vipitis/Shadertoys

中文

Dataset Card for Shadertoys

Dataset Summary

The Shadertoys dataset contains over 44k renderpasses collected from the Shadertoy.com API. Some shader programm contain multiple render passes. To browse a subset of this dataset, look at the ShaderEval space. A finer variant of this dataset is Shadertoys-fine .

Supported Tasks and Leaderboards

text-generation the dataset can be used to train generative language models, for code completion tasks. ShaderEval task1 from ShaderEval uses a dataset derived from Shadertoys to test return completion of autoregressive language models.

Languages

  • English (title, description, tags, comments)
  • Shadercode programming language, a subset of GLSL specifically for Shadertoy.com

Dataset Structure

Data Instances

A data point consists of the whole shadercode, some information from the API as well as additional metadata.

{
 'num_passes': 1,
 'has_inputs': False,
 'name': 'Image',
 'type': 'image',
 'code': '<full code>',
 'title': '<title of the shader>',
 'description': '<description of the shader>',
 'tags': ['tag1','tag2','tag3', ... ],
 'license': 'unknown',
 'author': '<username>',
 'source': 'https://shadertoy.com/view/<shaderID>'
}

Data Fields

  • 'num_passes' number of passes the parent shader program has
  • 'has_inputs' if any inputs were used like textures, audio streams,
  • 'name' Name of the renderpass, usually Image, Buffer A, Common, etc
  • 'type' type of the renderpass; one of {'buffer', 'common', 'cubemap', 'image', 'sound'}
  • 'code' the raw code (including comments) the whole renderpass.
  • 'title' Name of the Shader
  • 'description' description given for the Shader
  • 'tags' List of tags assigned to the Shader (by it's creator); there are more than 10000 unique tags.
  • 'license' currently in development
  • 'author' username of the shader author
  • 'source' URL to the shader. Not to the specific renderpass.

Data Splits

Currently available (shuffled):

  • train (85.0%)
  • test (15.0%)

Dataset Creation

Data retrieved starting 2022-07-20

Source Data

Initial Data Collection and Normalization

All data was collected via the Shadertoy.com API and then iterated over the items in 'renderpass' while adding some of the fields from 'info'. The code to generate these datasets should be published on the GitHub repository in the near future.

Who are the source language producers?

Shadertoy.com contributers which publish shaders as 'public+API'

Licensing Information

The Default license for each Shader is CC BY-NC-SA 3.0. However, some Shaders might have a different license attached. The Dataset is currently not filtering for any licenses but gives a license tag, if easily recognizeable by naive means. Please check the first comment of each shader program yourself as to not violate any copyrights for downstream use. The main license requires share alike and by attribution. Attribution of every data field can be found in the 'author' column, but might not include further attribution within the code itself or parents from forked shaders.