This is the Pytorch Implementation for CVPR 2024 paper: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation and the extended version of the conference paper: ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation.
pip install torch torchvision
# We're using python==3.9 torch==1.11.0 and torchvision==0.12.0
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
Maskclip
├── data
│ ├── VOCdevkit
│ │ ├── VOC2012
│ │ │ ├── JPEGImages
│ │ │ ├── SegmentationClass
│ │ │ ├── ImageSets
│ │ │ │ ├── Segmentation
│ │ ├── VOC2010
│ │ │ ├── JPEGImages
│ │ │ ├── SegmentationClassContext
│ │ │ ├── ImageSets
│ │ │ │ ├── SegmentationContext
│ │ │ │ │ ├── train.txt
│ │ │ │ │ ├── val.txt
│ │ │ ├── trainval_merged.json
│ ├── ADEChallengeData2016
│ │ ├── annotations
│ │ │ ├── training
│ │ │ ├── validation
│ │ ├── images
│ │ │ ├── training
│ │ │ ├── validation
│ ├── Cityscapes
│ │ ├── gtFine
│ │ │ ├── train
│ │ │ ├── val
│ │ ├── leftImg8bit
│ │ │ ├── train
│ │ │ ├── val
│ ├── coco_stuff164k
│ │ ├── images
│ │ │ ├── train2017
│ │ │ ├── val2017
│ │ ├── annotations
│ │ │ ├── train2017
│ │ │ ├── val2017
python utils/prompt_engineering.py --model ViT16 --class-set voc
# The Text Embeddings will be saved at 'text/voc_ViT16_clip_text.pth'
# Options for dataset: voc, context, ade, cityscapes, coco
python tools/pseudo_class.py --cfg 'config/voc_train_ori_cfg.yaml' --model 'RECLIPPP'
# The Image-level Multi-label Hypothesis will be saved at 'text/voc_pseudo_label_ReCLIPPP.json'
# Options for dataset: voc, context, ade, cityscapes, coco
# Options for model: RECLIPPP(ReCLIP++), ReCLIP(ReCLIP)
python tools/train.py --cfg 'config/voc_train_ori_cfg.yaml' --model 'RECLIPPP'
# Options for dataset: voc, context, ade, cityscapes, coco
# Options for model: RECLIPPP(ReCLIP++), ReCLIP(ReCLIP)
python tools/distill.py --cfg 'config/voc_distill_ori_cfg.yaml'
# Options for dataset: voc, context, ade, cityscapes, coco
python tools/test.py --cfg 'config/voc_test_ori_cfg.yaml' --model 'RECLIPPP'
# Options for dataset: voc, context, ade, cityscapes, coco
# Options for model: RECLIPPP(ReCLIP++), ReCLIP(ReCLIP)
python tools/distill_val.py --cfg 'config/voc_distill_ori_cfg.yaml'
# Options for dataset: voc, context, ade, cityscapes, coco
Dataset | Rectification | Distillation |
---|---|---|
PASCAL VOC | 58.5 | 75.4 |
PASCAL Context | 25.8 | 33.8 |
ADE20K | 11.1 | 14.3 |
Dataset | Rectification |
---|---|
PASCAL VOC | 85.4 |
PASCAL Context | 36.1 |
ADE20K | 16.4 |
Cityscapes | 26.5 |
COCO Stuff | 23.8 |
Please cite our paper if you use our code in your research:
@inproceedings{wang2024learn,
title={Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation},
author={Wang, Jingyun and Kang, Guoliang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={4102--4112},
year={2024}
}
@article{wang2024reclip++,
title={ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation},
author={Wang, Jingyun and Kang, Guoliang},
journal={arXiv preprint arXiv:2408.06747},
year={2024}
}
For questions about our paper or code, please contact [email protected].