SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Official Pytorch implementation of SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Step 1: Create a new environment
conda create -n segman python=3.10
conda activate segman
pip install torch==2.1.2 torchvision==0.16.2
Step 2: Install MMSegmentation v0.30.0 by following the installation guidelines and prepare segmentation datasets by following data preparation. The following installation commands works for me:
pip install -U openmim
mim install mmcv-full
cd segmentation
pip install -v -e .
To support torch>=2.1.0, you also need to replace Line 75
of /miniconda3/envs/segman/lib/python3.10/site-packages/mmcv/parallel/_functions.py
with the following:
if version.parse(torch.__version__) >= version.parse('2.1.0'):
streams = [_get_stream(torch.device("cuda", device)) for device in target_gpus]
else:
streams = [_get_stream(device) for device in target_gpus]
Step 3: Install dependencies using the following commands.
To install Natten, you should modify the following with your PyTorch and CUDA versions accordingly.
pip install natten==0.17.3+torch210cu121 -f https://shi-labs.com/natten/wheels/
The Selective Scan 2D can be install with:
cd kernels/selective_scan && pip install .
Install other requirements:
pip install -r requirements.txt
Download the ImageNet-1k pretrained weights here and put them in a folder pretrained/
. Navigate to the segmentation directory:
cd segmentation
Scripts to reproduce our paper results are provided in ./scripts
Example training script for SegMAN-B
on ADE20K
:
# Single-gpu
python tools/train.py local_configs/segman/base/segman_b_ade.py --work-dir outputs/EXP_NAME
# Multi-gpu
bash tools/dist_train.sh local_configs/segman/base/segman_b_ade.py <GPU_NUM> --work-dir outputs/EXP_NAME
Download trained weights
for segmentation models at google drive. Navigate to the segmentation directory:
cd segmentation
Example for evaluating SegMAN-B
on ADE20K
:
# Single-gpu
python tools/test.py local_configs/segman/base/segman_b_ade.py /path/to/checkpoint_file
# Multi-gpu
bash tools/dist_test.sh local_configs/segman/base/segman_b_ade.py /path/to/checkpoint_file <GPU_NUM>
We provide scripts for pre-training the encoder from scratch.
Step 1: Download ImageNet-1k and using this script to extract it.
Step 2: Start training with
bash scripts/train_segman-s.sh
Our implementation is based on MMSegmentaion, Natten, VMamba, and SegFormer. We gratefully thank the authors.
@article{SegMAN,
title={SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation},
author={Yunxiang Fu and Meng Lou and Yizhou Yu},
journal={arXiv preprint arXiv:2412.11890},
year={2024}
}