Mask2Former

Mask2Former 模型是在 Cityscapes 语义分割数据集上训练的（微型版本，使用 Swin 骨干网络）。该模型在论文 Masked-attention Mask Transformer for Universal Image Segmentation 中被提出，并在 this repository 中首次发布。

免责声明：发布 Mask2Former 的团队没有为该模型编写模型卡片，因此本模型卡片是由 Hugging Face 团队编写的。

模型描述

Mask2Former 以相同的范式处理实例分割、语义分割和全景分割的任务，通过预测一组掩模和相应的标签来解决这三个任务。因此，所有这三个任务都被视为实例分割。Mask2Former 通过以下方式优于之前的 SOTA 模型 MaskFormer ：(i) 使用更先进的多尺度可变形注意力 Transformer 替代像素解码器，(ii)采用具有掩码注意力的 Transformer 解码器以提高性能而不引入额外计算，(iii) 通过对子采样点而不是整个掩模计算损失来提高训练效率。

预期用途和限制

您可以将此特定检查点用于全景分割。可以查看 model hub ，以查找您感兴趣的其他任务的微调版本。

如何使用

以下是如何使用该模型的方法：

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on Cityscapes semantic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-tiny-cityscapes-semantic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-tiny-cityscapes-semantic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
predicted_semantic_map = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)

有关更多代码示例，请参考 documentation 。

作者:

Meta AI

数据集大小:

181.33 MB