模型:
facebook/levit-256
LeViT-256模型在ImageNet-1k数据集上进行了224x224分辨率的预训练。这个模型最初是由Graham等人在 LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference 论文中提出并在 this repository 首次发布。
免责声明:LeViT发布团队没有为这个模型编写模型卡片,因此这个模型卡片是由Hugging Face团队编写的。
以下是如何使用该模型对COCO 2017数据集的图像进行分类,将其归类为1,000个ImageNet类别之一:
from transformers import LevitFeatureExtractor, LevitForImageClassificationWithTeacher from PIL import Image import requests url = 'http://images.cocodataset.org/val2017/000000039769.jpg' image = Image.open(requests.get(url, stream=True).raw) feature_extractor = LevitFeatureExtractor.from_pretrained('facebook/levit-256') model = LevitForImageClassificationWithTeacher.from_pretrained('facebook/levit-256') inputs = feature_extractor(images=image, return_tensors="pt") outputs = model(**inputs) logits = outputs.logits # model predicts one of the 1000 ImageNet classes predicted_class_idx = logits.argmax(-1).item() print("Predicted class:", model.config.id2label[predicted_class_idx])