Parameter-level Mask-based Adapter for 3D Continual Learning with Vision-Language Model

Introduction

As 3D object classification becomes increasingly essential in various applications, the need for models to continuously adapt to new scenarios is more crucial than ever. We propose Parameter-level Mask-based Adapter (PMA) for 3D continual learning with Vision-Language Model. Our approach leverages the power of Contrastive Language-Image Pre-Training (CLIP) to extract rich multimodal features by projecting 3D point clouds into multi-angle depth maps, which are then processed by the 2D encoder aligned with rendered images. The core of our method is a parameter-level adapter with learnable masks that effectively integrates 3D point cloud features with 2D features. This mechanism flexibly selects the most relevant subsets of weights for each task, allowing subsequent tasks to utilize the learned weights without the need for updates, which effectively balances stability and plasticity during continual learning. Experimental results show that our method significantly outperforms existing state-of-the-art techniques.

Installation

PyTorch, PyTorch3d, CLIP, pointnet2_ops, etc., are required. We recommend to create a conda environment and install dependencies in Linux as follows:

# create a conda environment
conda create -n PMA python=3.7 -y
conda activate PMA

# install pytorch & pytorch3d
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install -c bottler nvidiacub
conda install pytorch3d -c pytorch3d
pip install "git+https://github.com/facebookresearch/pytorch3d.git"

# install CLIP
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git

# install pointnet2 & other packages
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
pip install -r requirements.txt

Data preparation

The overall directory structure should be:

│PMA-3D/
├──datasets/
├──data/
│   ├──ModelNet40_Align/
│   ├──ModelNet40_Ply/
│   ├──Rendering/
│   ├──ShapeNet55/
│   ......
├──.......

Moreover please download CO3D dataset in "/data/CO3D"

Please refer to CLIP2Point for the dataset download.

Get start

download the pre-trained checkpoint best_eval.pth best_test.pth and dgcnn_occo_cls.pth

│PMA-3D/
├──pre_builts/
│   ├──vit32/
│   |	├──best_eval.pth/
│   |	├──best_test.pth/
│   ├──point/
│   |	├──dgcnn_occo_cls.pth/

python main.py

You can change session_settings.py and args to run in other datasets.

You can directly load the trained parameters from ./models/state_dict/task_6.pth. or download it from the attachment.

Acknowledgement

This work was supported by the National Science and Technology Major Project of China under Grant 2021ZD0112002 and the Project for Self-Developed Innovation Team of Jinan City under Grant 2021GXRC038.

Citation

@inproceedings{xu2023filp,
  title={Parameter-level Mask-based Adapter for 3D Continual Learning with Vision-Language Model},
  author={Qian, Longyue and Zhang, Mingxin and Zhang, Qian and Zhang, Lin and Li, Teng and  Zhang, Wei},
  booktitle={2024 IEEE International Conference on Robotics and Biomimetics (ROBIO)},
  pages={xx--xx},
  year={2024},
  organization={IEEE}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter-level Mask-based Adapter for 3D Continual Learning with Vision-Language Model

Introduction

Installation

Data preparation

Get start

Acknowledgement

Citation

Releases: vsislab/vsislab.github.io

PMA-3D

Parameter-level Mask-based Adapter for 3D Continual Learning with Vision-Language Model

Introduction

Installation

Data preparation

Get start

Acknowledgement

Citation