This repository contains the implementation for CVPR2024 paper
FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
The following GIF animations display a comparison of interactive segmentation results between SAM and our FocSAM. Notably, FocSAM demonstrates a remarkably stable performance with significantly less fluctuation in IoU compared to SAM, across various datasets.
For detailed installation instructions, please refer to INSTALL.
Alternatively, ensure you have Python version 3.11.0 set up in your environment. Then, install all dependencies by running the following command in your terminal:
bash scripts/install.sh
For detailed dataset preparation instructions, please refer to DATASETS.
- Download: Acquire the pretrained SAM-ViT-H and save it to
pretrain/sam_vit_h_4b8939.pth
. - Conversion: Convert the downloaded weights using the command below:
python tools/model_converters/samvit2mmclickseg.py pretrain/sam_pretrain_vit_huge.pth
- Download: Obtain the pretrained FocSAM-ViT-H, and unzip it in
work_dirs/focsam/focsam_vit_huge_eval
.
- Single GPU (Example for DAVIS dataset):
export PYTHONPATH=.
python tools/test_no_viz.py configs/_base_/eval_davis.py work_dirs/focsam/focsam_vit_huge_eval/iter_160000.pth
- Multi-GPU:
bash tools/dist_test.sh configs/_base_/eval_davis.py work_dirs/focsam/focsam_vit_huge_eval/iter_160000.pth 4
- CPU (Not recommended):
export PYTHONPATH=.
CUDA_VISIBLE_DEVICES= python tools/test_no_viz.py configs/_base_/eval_davis.py work_dirs/focsam/focsam_vit_huge_eval/iter_160000.pth
- Evaluating on Other Datasets: Replace the config file for other datasets as needed:
configs/_base_/eval_sbd.py # for SBD
configs/_base_/eval_grabcut.py # for GrabCut
configs/_base_/eval_berkeley.py # for Berkeley
configs/_base_/eval_mvtec.py # for MVTec
configs/_base_/eval_cod10k.py # for COD10K
- Single GPU:
export PYTHONPATH=.
python tools/train.py configs/sam/coco_lvis/train_colaug_coco_lvis_1024x1024_320k.py
- Multi-GPU:
bash tools/dist_train.sh configs/sam/coco_lvis/train_colaug_coco_lvis_1024x1024_320k.py 4
- CPU (Not recommended):
export PYTHONPATH=.
CUDA_VISIBLE_DEVICES= python tools/train.py configs/sam/coco_lvis/train_colaug_coco_lvis_1024x1024_320k.py
-
Important Pre-requisite: Begin by training the SAM decoder. This step produces the required file
work_dirs/sam/coco_lvis/train_colaug_coco_lvis_1024x1024_320k/iter_320000.pth
, which is essential for the subsequent training of the FocSAM refiner. -
Single GPU:
export PYTHONPATH=.
python tools/train.py configs/focsam/coco_lvis/train_colaug_coco_lvis_1024x1024_160k.py
- Multi-GPU:
bash tools/dist_train.sh configs/focsam/coco_lvis/train_colaug_coco_lvis_1024x1024_160k.py 4
- CPU (Not recommended):
export PYTHONPATH=.
CUDA_VISIBLE_DEVICES= python tools/train.py configs/focsam/coco_lvis/train_colaug_coco_lvis_1024x1024_160k.py
This project is licensed under the MIT License - see the LICENSE file for details.