ConvNeXT (基础大小模型)

ConvNeXT 模型在 ImageNet-22k 数据集上进行了预训练，并在分辨率为 384x384 的 ImageNet-1k 数据集上进行了微调。该模型是由刘等人在论文 A ConvNet for the 2020s 中提出，并于 this repository 首次发布。

免责声明：发布 ConvNeXT 模型的团队没有为该模型编写模型卡，所以该模型卡是由 Hugging Face 团队编写的。

模型描述

ConvNeXT 是一个纯卷积模型 (ConvNet)，受到 Vision Transformers 的设计启发，并声称在性能上优于它们。作者从一个 ResNet 开始，并通过借鉴 Swin Transformer 的设计来进行“现代化”的改进。

预期使用和限制

您可以使用原始模型进行图像分类。请参阅 model hub ，以寻找您感兴趣的任务上的微调版本。

使用方法

以下是如何使用该模型将 COCO 2017 数据集中的图像分类为 1,000 个 ImageNet 类别的示例：

from transformers import ConvNextFeatureExtractor, ConvNextForImageClassification
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

feature_extractor = ConvNextFeatureExtractor.from_pretrained("facebook/convnext-base-384-22k-1k")
model = ConvNextForImageClassification.from_pretrained("facebook/convnext-base-384-22k-1k")

inputs = feature_extractor(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# model predicts one of the 1000 ImageNet classes
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label]),

更多代码示例，请参阅 documentation 。

BibTeX 条目和引用信息

@article{DBLP:journals/corr/abs-2201-03545,
  author    = {Zhuang Liu and
               Hanzi Mao and
               Chao{-}Yuan Wu and
               Christoph Feichtenhofer and
               Trevor Darrell and
               Saining Xie},
  title     = {A ConvNet for the 2020s},
  journal   = {CoRR},
  volume    = {abs/2201.03545},
  year      = {2022},
  url       = {https://arxiv.org/abs/2201.03545},
  eprinttype = {arXiv},
  eprint    = {2201.03545},
  timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2201-03545.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

作者:

Meta AI

数据集大小:

676.46 MB