英文

DPT (大型模型) 在ADE20k上进行微调

DPT 是在ADE20k上进行语义分割训练的密集预测Transformer模型。它由Ranftl等人在论文 Vision Transformers for Dense Prediction 中提出,并于 this repository 首次发布。

免责声明:发布DPT的团队没有为该模型撰写模型卡片,因此该模型卡片由Hugging Face团队编写。

模型描述

DPT使用视觉Transformer(ViT)作为骨干,然后在其上方添加了颈部和头部,用于进行语义分割。

预期用途和限制

您可以使用原始模型进行语义分割。请查看 model hub 获取您感兴趣的任务的微调版本。

如何使用

您可以按照以下步骤使用该模型:

from transformers import DPTFeatureExtractor, DPTForSemanticSegmentation
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large-ade")
model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade")

inputs = feature_extractor(images=image, return_tensors="pt")

outputs = model(**inputs)
logits = outputs.logits

有关更多代码示例,请参阅 documentation

BibTeX条目和引用信息

@article{DBLP:journals/corr/abs-2103-13413,
  author    = {Ren{\'{e}} Ranftl and
               Alexey Bochkovskiy and
               Vladlen Koltun},
  title     = {Vision Transformers for Dense Prediction},
  journal   = {CoRR},
  volume    = {abs/2103.13413},
  year      = {2021},
  url       = {https://arxiv.org/abs/2103.13413},
  eprinttype = {arXiv},
  eprint    = {2103.13413},
  timestamp = {Wed, 07 Apr 2021 15:31:46 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2103-13413.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}