Segment Everything Everywhere All at Once - Summary
The paper presents SEEM, a promptable, interactive model for Segmenting Everything Everywhere all at once in an image. It introduces a versatile prompting engine for different types of prompts, including points, boxes, scribbles, masks, texts, and referred regions of another image. The model can ef
Arxiv URL: https://arxiv.org/abs/2304.06718v1
Authors: Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, Jianfeng Gao, Yong Jae Lee
Summary: 
The paper presents SEEM, a promptable, interactive model for Segmenting Everything Everywhere all at once in an image. It introduces a versatile prompting engine for different types of prompts, including points, boxes, scribbles, masks, texts, and referred regions of another image. The model can efficiently handle multiple rounds of interactions with a lightweight prompt decoder and has a strong capability of generalizing to unseen user intents.
Key Insights & Learnings:
- SEEM is a promptable, interactive model for Segmenting Everything Everywhere all at once in an image.
 - The model introduces a versatile prompting engine for different types of prompts, including points, boxes, scribbles, masks, texts, and referred regions of another image.
 - SEEM can efficiently handle multiple rounds of interactions with a lightweight prompt decoder.
 - The model has a strong capability of generalizing to unseen user intents.
 - A comprehensive empirical study is performed to validate the effectiveness of SEEM on various segmentation tasks.
 
Terms Mentioned: Segmentation, Prompting, Transformer, Visual understanding, Semantic segmentation
Technologies / Libraries Mentioned: GPT, T5, DETR, CLIP, X-Decoder