模型:
Intel/dpt-large-ade
DPT 是在ADE20k上进行语义分割训练的密集预测Transformer模型。它由Ranftl等人在论文 Vision Transformers for Dense Prediction 中提出,并于 this repository 首次发布。
免责声明:发布DPT的团队没有为该模型撰写模型卡片,因此该模型卡片由Hugging Face团队编写。
DPT使用视觉Transformer(ViT)作为骨干,然后在其上方添加了颈部和头部,用于进行语义分割。
您可以使用原始模型进行语义分割。请查看 model hub 获取您感兴趣的任务的微调版本。
您可以按照以下步骤使用该模型:
from transformers import DPTFeatureExtractor, DPTForSemanticSegmentation from PIL import Image import requests url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-large-ade") model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade") inputs = feature_extractor(images=image, return_tensors="pt") outputs = model(**inputs) logits = outputs.logits
有关更多代码示例,请参阅 documentation 。
@article{DBLP:journals/corr/abs-2103-13413, author = {Ren{\'{e}} Ranftl and Alexey Bochkovskiy and Vladlen Koltun}, title = {Vision Transformers for Dense Prediction}, journal = {CoRR}, volume = {abs/2103.13413}, year = {2021}, url = {https://arxiv.org/abs/2103.13413}, eprinttype = {arXiv}, eprint = {2103.13413}, timestamp = {Wed, 07 Apr 2021 15:31:46 +0200}, biburl = {https://dblp.org/rec/journals/corr/abs-2103-13413.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }