Vision Transformer（大型模型）使用MSN进行预训练

Vision Transformer（ViT）模型是使用MSN方法进行预训练的。它在Mahmoud Assran、Mathilde Caron、Ishan Misra、Piotr Bojanowski、Florian Bordes、Pascal Vincent、Armand Joulin、Michael Rabbat、Nicolas Ballas的论文 Masked Siamese Networks for Label-Efficient Learning 中介绍，并在 this repository 中首次发布。

免责声明：发布MSN的团队没有为此模型编写模型卡片，因此此模型卡片是由Hugging Face团队编写的。

模型描述

Vision Transformer（ViT）是一种转换编码器模型（类似于BERT）。图像以固定大小的补丁序列的形式呈现给模型。

MSN提供了一种联合嵌入架构，将修饰补丁的原型与未修饰补丁的原型进行匹配。通过这个设置，他们的方法在低样本和极低样本情况下都能取得出色的性能。

通过预训练模型，它学习了图像的内部表示，可以用来提取用于下游任务的特征：例如，如果您有一个带标签的图像数据集，您可以在预训练编码器之上放置一个线性层来训练一个标准分类器。

拟用途和限制

您可以使用原始模型进行下游任务，如图像分类。查看 model hub 以查找您感兴趣的不同版本的MSN预训练模型。当训练集中只有少量标记样本时，该模型特别有益。

如何使用

这里是如何使用此基础编码器的方法：

from transformers import AutoFeatureExtractor, ViTMSNModel
import torch
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/vit-msn-large")
model = ViTMSNModel.from_pretrained("facebook/vit-msn-large")
inputs = feature_extractor(images=image, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state

对于图像分类的微调，请使用ViTMSNForImageClassification类：

from transformers import AutoFeatureExtractor, ViTMSNForImageClassification
import torch
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/vit-msn-large")
model = ViTMSNForImageClassification.from_pretrained("facebook/vit-msn-large")

...

引用

@article{assran2022masked,
  title={Masked Siamese Networks for Label-Efficient Learning}, 
  author={Assran, Mahmoud, and Caron, Mathilde, and Misra, Ishan, and Bojanowski, Piotr, and Bordes, Florian and Vincent, Pascal, and Joulin, Armand, and Rabbat, Michael, and Ballas, Nicolas},
  journal={arXiv preprint arXiv:2204.07141},
  year={2022}
}

作者:

Meta AI

数据集大小:

1.13 GB