模型:

laion/CLIP-ViT-B-32-roberta-base-laion2B-s12B-b32k

类库:

OpenCLIP

预印本库:

arxiv:1910.04867

许可:

mit

模型介绍文件清单

英文

CLIP ViT-B/32 roberta base - LAION-2B 模型卡片

模型详情

模型描述

一个使用LAION-5B的LAION-2B英文子集（ https://laion.ai/blog/laion-5b/ ）和OpenCLIP（ https://github.com/mlfoundations/open_clip ）训练的CLIP ViT-B/32 roberta base 模型。

模型训练由Romain Beaumont在 stability.ai 集群上完成。

使用方式

直接使用

零样本图像分类、图像和文本检索等等。

下游应用

图像分类和其他图像任务微调、线性探测图像分类、图像生成引导和调节等等。

训练详情

训练数据

该模型使用了LAION-5B的20亿个样本的英文子集（ https://laion.ai/blog/laion-5b/ ）。

训练过程

以32k批量大小训练12B样本的laion2B-en，详见 https://wandb.ai/rom1504/open-clip/reports/clip-B-32-roberta-base--VmlldzoyOTM0NDQ3 。

模型在视觉端上为B/32，文本端使用roberta base的预训练权重进行初始化。

评估

评估使用的代码在 LAION CLIP Benchmark suite 中。

测试数据、因素和指标

测试数据

使用VTAB+（VTAB与其他鲁棒性数据集的组合）进行分类测试，使用COCO和Flickr进行检索测试。

结果

该模型实现了以下结果

图像网1k：61.7%（基准为62.9%）
mscoco：63%（基准为60.8%）
flickr30k：86.7%（基准为85.4%）

致谢

感谢 stability.ai 提供训练所使用的计算资源。

引用

BibTeX：

除了即将发布的LAION-5B论文（ https://laion.ai/blog/laion-5b/ ）之外，请引用：

OpenAI CLIP 论文

@inproceedings{Radford2021LearningTV,
  title={Learning Transferable Visual Models From Natural Language Supervision},
  author={Alec Radford and Jong Wook Kim and Chris Hallacy and A. Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever},
  booktitle={ICML},
  year={2021}
}

OpenCLIP 软件

@software{ilharco_gabriel_2021_5143773,
  author       = {Ilharco, Gabriel and
                  Wortsman, Mitchell and
                  Wightman, Ross and
                  Gordon, Cade and
                  Carlini, Nicholas and
                  Taori, Rohan and
                  Dave, Achal and
                  Shankar, Vaishaal and
                  Namkoong, Hongseok and
                  Miller, John and
                  Hajishirzi, Hannaneh and
                  Farhadi, Ali and
                  Schmidt, Ludwig},
  title        = {OpenCLIP},
  month        = jul,
  year         = 2021,
  note         = {If you use this software, please cite it as below.},
  publisher    = {Zenodo},
  version      = {0.1},
  doi          = {10.5281/zenodo.5143773},
  url          = {https://doi.org/10.5281/zenodo.5143773}
}

如何开始使用该模型

https://github.com/mlfoundations/open_clip

作者:

LAION eV

数据集大小:

815.15 MB