TinyViM

TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba

Huawei Noah’s Ark Lab

🔥 News

2024/11/29: Code is open.
2024/11/27: TinyViM is available at Arxiv.

📷 Introduction

We build a series of tiny hybrid vision Mamba called TinyViM by integrating mobile-friendly convolution and efficient Laplace mixer. The proposed TinyViM achieves impressive performance on several downstream tasks including image classification, semantic segmentation, object detection and instance segmentation. In particular, TinyViM outperforms Convolution, Transformer and Mamba-based models with similar scales, and the throughput is about 2-3 times higher than that of other Mamba-based models.

🏆 Performance

1️⃣ Classification

Model	Type	Params (M)	GMACs	Throughput (im/s)	Top-1
TinyViM-S	CNN-Mamba	5.6	0.9	2563	79.2
TinyViM-B	CNN-Mamba	11.0	1.5	1851	81.2
TinyViM-L	CNN-Mamba	31.7	4.7	843	83.3

2️⃣ Detection & Instance Segmentation

Model	Head	AP-box	AP-mask
TinyViM-B	Mask RCNN	42.3	38.7
TinyViM-L	Mask RCNN	44.5	40.7

3️⃣ Semantic Segmentation

Model	Head	Throughput	mIoU
TinyViM-B	FPN	180	41.9
TinyViM-L	FPN	111	44.2

📚 Use example

Environment
```
conda create --name tinyvim python=3.9.11 -y
conda activate tinyvim
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install timm==0.5.4
```
Please refer to VMamba for installing selective_scan_cuda.

Please refer to mmdetection-2.28.2 and mmsegmentation-0.30.0 for environments and data preparation of detection and segmentation, respectively.
Train
```
bash train.sh
```
Test
```
bash eval.sh
```

speed

python speed_gpu.py --model TinyViM_S --resolution 224 --batch 2048

Detection & Instance Segmentation

cd detection
bash train.sh # for train
bash eval.sh # for eval

Semantic Segmentation

cd segmentation
bash train.sh # for train
bash eval.sh # for eval

🌟 Citation

If you are interested in our work, please consider giving a 🌟 and citing our work below.

@misc{tinyvim,
      title={TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba}, 
      author={Xiaowen Ma and Zhenliang Ni and Xinghao Chen},
      year={2024},
      eprint={2411.17473},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17473}, 
}

💡Acknowledgment

Thanks to previous open-sourced repo: Efficientformer, Swiftformer, RepViT, mmsegmentation, mmdetection

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TinyViM

TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba

🔥 News

📷 Introduction

🏆 Performance

1️⃣ Classification

2️⃣ Detection & Instance Segmentation

3️⃣ Semantic Segmentation

📚 Use example

🌟 Citation

💡Acknowledgment

Files

README.md

Latest commit

History

README.md

File metadata and controls

TinyViM

TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba

🔥 News

📷 Introduction

🏆 Performance

1️⃣ Classification

2️⃣ Detection & Instance Segmentation

3️⃣ Semantic Segmentation

📚 Use example

🌟 Citation

💡Acknowledgment