Segment Everything Everywhere All at Once - Summary

The paper presents SEEM, a promptable, interactive model for Segmenting Everything Everywhere all at once in an image. It introduces a versatile prompting engine for different types of prompts, including points, boxes, scribbles, masks, texts, and referred regions of another image. The model can ef

Arxiv URL: https://arxiv.org/abs/2304.06718v1

Authors: Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, Jianfeng Gao, Yong Jae Lee

Summary:

The paper presents SEEM, a promptable, interactive model for Segmenting Everything Everywhere all at once in an image. It introduces a versatile prompting engine for different types of prompts, including points, boxes, scribbles, masks, texts, and referred regions of another image. The model can efficiently handle multiple rounds of interactions with a lightweight prompt decoder and has a strong capability of generalizing to unseen user intents.

Key Insights & Learnings:

  • SEEM is a promptable, interactive model for Segmenting Everything Everywhere all at once in an image.
  • The model introduces a versatile prompting engine for different types of prompts, including points, boxes, scribbles, masks, texts, and referred regions of another image.
  • SEEM can efficiently handle multiple rounds of interactions with a lightweight prompt decoder.
  • The model has a strong capability of generalizing to unseen user intents.
  • A comprehensive empirical study is performed to validate the effectiveness of SEEM on various segmentation tasks.


Terms Mentioned: Segmentation, Prompting, Transformer, Visual understanding, Semantic segmentation

Technologies / Libraries Mentioned: GPT, T5, DETR, CLIP, X-Decoder