Mask2Former

Mask2Former 模型基于 COCO 全景分割训练（tiny 版本，基于 Swin 骨干网络）。该模型首次在 this repository 论文中提出和发布。

免责声明：Mask2Former 模型发布团队未为此模型编写模型卡片，因此模型卡片由 Hugging Face 团队撰写。

模型描述

Mask2Former 以相同的范式解决了实例分割、语义分割和全景分割的问题：通过预测一组掩膜和对应的标签。因此，所有三个任务都被视为实例分割。Mask2Former 在性能和效率上均超过了以前的 SOTA 模型 MaskFormer ，具体体现在以下方面： (i) 采用了一个更先进的多尺度可变形注意力 Transformer 替换像素解码器。 (ii) 采用具有掩码注意力的 Transformer 解码器，提高性能而不引入额外的计算。 (iii) 在计算损失时使用子采样点而不是整个掩膜，提高训练效率。

预期用途和限制

您可以使用此特定检查点进行全景分割。请查看 model hub 以寻找您感兴趣的其他任务的微调版本。

如何使用

您可以按照以下方式使用此模型：

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation


# load Mask2Former fine-tuned on COCO panoptic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-tiny-coco-panoptic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-tiny-coco-panoptic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_panoptic_map = result["segmentation"]

更多代码示例，请参考 documentation 。

作者:

Meta AI

数据集大小:

181.45 MB