Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network
In ICCV, 2023


Cong Han1
Yujie Zhong1
Dengjie Li1
Kai Han2
Lin Ma1


Meituan Inc.
The University of Hong Kong




Teaser figure.



Abstract

Recently, the open-vocabulary semantic segmentation problem has attracted increasing attention and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pre-trained visual-language model. However, existing two-stream methods require passing a great number of (up to a hundred) image crops into the visual-language model, which is highly inefficient. To address the problem, we propose a network that only needs a single pass through the visual-language model for each input image. Specifically, we first propose a novel network adaptation approach, termed patch severance, to restrict the harmful interference between the patch embeddings in the pre-trained visual encoder. We then propose classification anchor learning to encourage the network to spatially focus on more discriminative features for classification. Extensive experiments demonstrate that the proposed method achieves outstanding performance, surpassing state-of-theart methods while being 4 to 7 times faster at inference.




Method Overview

Method overview figure

Results

Results figure Results figure


Paper

Paper thumbnail

Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network

Cong Han, Yujie Zhong, Dengjie Li, Kai Han, and Lin Ma

In ICCV, 2023.

@proceedings{Han2023ZeroShotSS,
title = {Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network},
author = {Cong Han, Yujie Zhong, Dengjie Li, Kai Han, and Lin Ma},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2023}
}
    
    



Acknowledgements

This work is supported by National Key R&D Program of China (No. 2022ZD0118700), Hong Kong Research Grant Council - Early Career Scheme (Grant No. 27208022), and HKU Seed Fund for Basic Research.



Webpage template borrowed from Split-Brain Autoencoders, CVPR 2017.