数据集:
dali-does/clevr-math
任务:
视觉问答语言:
en计算机处理:
monolingual语言创建人:
machine-generated批注创建人:
machine-generated源数据集:
clevr预印本库:
arxiv:2208.05358许可:
cc-by-4.0基于CLEVR的用于组合多模态数学推理的数据集。
使用CLIPfrom transformers import CLIPPreprocessor from datasets import load_dataset, DownloadConfig dl_config = DownloadConfig(resume_download=True, num_proc=8, force_download=True) # Load 'general' instance of dataset dataset = load_dataset('dali-does/clevr-math', download_config=dl_config) # Load version with only multihop in test data dataset_multihop = load_dataset('dali-does/clevr-math', 'multihop', download_config=dl_config) model_path = "openai/clip-vit-base-patch32" extractor = CLIPProcessor.from_pretrained(model_path) def transform_tokenize(e): e['image'] = [image.convert('RGB') for image in e['image']] return extractor(text=e['question'], images=e['image'], padding=True) dataset = dataset.map(transform_tokenize, batched=True, num_proc=8, padding='max_length') dataset_subtraction = dataset.filter(lambda e: e['template'].startswith('subtraction'), num_proc=4)加载数据,预处理文本。
排行榜将在稍后公布。
该数据集目前仅支持英语。要将数据集扩展到其他语言,需要将CLEVR模板改写成目标语言。
features = datasets.Features( { "template": datasets.Value("string"), "id": datasets.Value("string"), "question": datasets.Value("string"), "image": datasets.Image(), "label": datasets.Value("int64") } )
训练/验证/测试
使用CLEVR数据集提供的代码生成数据,使用blender和数据集管理员构建的模板。
[需要更多信息]
Adam Dahlgren Lindström - dali@cs.umu.se
根据知识共享署名相同方式共享4.0国际许可(CC-BY 4.0)
[需要更多信息]
@misc{https://doi.org/10.48550/arxiv.2208.05358, doi = {10.48550/ARXIV.2208.05358}, url = {https://arxiv.org/abs/2208.05358}, author = {Lindström, Adam Dahlgren and Abraham, Savitha Sam}, keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences, I.2.7; I.2.10; I.2.6; I.4.8; I.1.4}, title = {CLEVR-Math: A Dataset for Compositional Language, Visual, and Mathematical Reasoning}, publisher = {arXiv}, year = {2022}, copyright = {Creative Commons Attribution Share Alike 4.0 International} }
感谢 @dali-does 添加此数据集。