TinyViM

TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba

Huawei Noah’s Ark Lab

🔥 News

2024/11/29: Code is open.
2024/11/27: TinyViM is available at Arxiv.

📷 Introduction

We build a series of tiny hybrid vision Mamba called TinyViM by integrating mobile-friendly convolution and efficient Laplace mixer. The proposed TinyViM achieves impressive performance on several downstream tasks including image classification, semantic segmentation, object detection and instance segmentation. In particular, TinyViM outperforms Convolution, Transformer and Mamba-based models with similar scales, and the throughput is about 2-3 times higher than that of other Mamba-based models.

🏆 Performance

1️⃣ Classification

Model	Type	Params (M)	GMACs	Throughput (im/s)	Top-1
TinyViM-S	CNN-Mamba	5.6	0.9	2563	79.2
TinyViM-B	CNN-Mamba	11.0	1.5	1851	81.2
TinyViM-L	CNN-Mamba	31.7	4.7	843	83.3

2️⃣ Detection & Instance Segmentation

Model	Head	AP-box	AP-mask
TinyViM-B	Mask RCNN	42.3	38.7
TinyViM-L	Mask RCNN	44.5	40.7

3️⃣ Semantic Segmentation

Model	Head	Throughput	mIoU
TinyViM-B	FPN	180	41.9
TinyViM-L	FPN	111	44.2

📚 Use example

Environment
```
conda create --name tinyvim python=3.9.11 -y
conda activate tinyvim
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install timm==0.5.4
```
Please refer to VMamba for installing selective_scan_cuda.

Please refer to mmdetection-2.28.2 and mmsegmentation-0.30.0 for environments and data preparation of detection and segmentation, respectively.
Train
```
bash train.sh
```
Test
```
bash eval.sh
```

speed

python speed_gpu.py --model TinyViM_S --resolution 224 --batch 2048

Detection & Instance Segmentation

cd detection
bash train.sh # for train
bash eval.sh # for eval

Semantic Segmentation

cd segmentation
bash train.sh # for train
bash eval.sh # for eval

🌟 Citation

If you are interested in our work, please consider giving a 🌟 and citing our work below.

@misc{tinyvim,
      title={TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba}, 
      author={Xiaowen Ma and Zhenliang Ni and Xinghao Chen},
      year={2024},
      eprint={2411.17473},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17473}, 
}

💡Acknowledgment

Thanks to previous open-sourced repo: Efficientformer, Swiftformer, RepViT, mmsegmentation, mmdetection

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
detection		detection
fig		fig
model		model
segmentation		segmentation
README.md		README.md
engine.py		engine.py
eval.sh		eval.sh
losses.py		losses.py
main.py		main.py
speed_gpu.py		speed_gpu.py
train.sh		train.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyViM

TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba

🔥 News

📷 Introduction

🏆 Performance

1️⃣ Classification

2️⃣ Detection & Instance Segmentation

3️⃣ Semantic Segmentation

📚 Use example

🌟 Citation

💡Acknowledgment

About

Releases

Packages

Languages

xwmaxwma/TinyViM

Folders and files

Latest commit

History

Repository files navigation

TinyViM

TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba

🔥 News

📷 Introduction

🏆 Performance

1️⃣ Classification

2️⃣ Detection & Instance Segmentation

3️⃣ Semantic Segmentation

📚 Use example

🌟 Citation

💡Acknowledgment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages