Arxiv URL: https://arxiv.org/abs/2304.03284v1
Authors: Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang
SegGPT is a generalist model for segmenting everything in context. It unifies various segmentation tasks into a generalist in-context learning framework that can perform arbitrary segmentation tasks in images or videos via in-context inference. It is evaluated on a broad range of tasks, including few-shot semantic segmentation, video object segmentation, semantic segmentation, and panoptic segmentation, and shows strong capabilities in segmenting in-domain and out-of-domain targets.
Key Insights & Learnings:
- SegGPT is a single model that can perform diverse segmentation tasks automatically.
- The model is trained as an in-context coloring problem with random color mapping for each data sample.
- SegGPT can perform arbitrary segmentation tasks in images or videos via in-context inference, such as object instance, stuff, part, contour, and text.
- A feature ensemble strategy is proposed to effectively ensemble multiple examples in context.
- SegGPT can serve as a specialist model without updating the model parameters, by tuning a specific prompt for a specialized use case.
Terms Mentioned: segmentation, computer vision, semantic segmentation, instance segmentation, video object segmentation, panoptic segmentation, in-context learning, ViT, smooth-ℓ1 loss
Technologies / Libraries Mentioned: PyTorch, ADE20K