Segment Everything Everywhere All at Once - Summary
The paper presents SEEM, a promptable, interactive model for Segmenting Everything Everywhere all at once in an image. It introduces a versatile prompting engine for different types of prompts, including points, boxes, scribbles, masks, texts, and referred regions of another image. The model can ef
Arxiv URL: https://arxiv.org/abs/2304.06718v1
Authors: Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, Jianfeng Gao, Yong Jae Lee
Summary:
The paper presents SEEM, a promptable, interactive model for Segmenting Everything Everywhere all at once in an image. It introduces a versatile prompting engine for different types of prompts, including points, boxes, scribbles, masks, texts, and referred regions of another image. The model can efficiently handle multiple rounds of interactions with a lightweight prompt decoder and has a strong capability of generalizing to unseen user intents.
Key Insights & Learnings:
- SEEM is a promptable, interactive model for Segmenting Everything Everywhere all at once in an image.
- The model introduces a versatile prompting engine for different types of prompts, including points, boxes, scribbles, masks, texts, and referred regions of another image.
- SEEM can efficiently handle multiple rounds of interactions with a lightweight prompt decoder.
- The model has a strong capability of generalizing to unseen user intents.
- A comprehensive empirical study is performed to validate the effectiveness of SEEM on various segmentation tasks.
Terms Mentioned: Segmentation, Prompting, Transformer, Visual understanding, Semantic segmentation
Technologies / Libraries Mentioned: GPT, T5, DETR, CLIP, X-Decoder