Skip to content

PPjmchen/vlmpc

Repository files navigation

VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation

Wentao Zhao*, Jiaming Chen*, Ziyu Meng, Donghui Mao, Ran Song†, Wei Zhang

Shandong University


This is the official repo for our Paper: VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation, which is accepted by RSS2024.

We provide the implementation of VLMPC in Language-Table environment.

framework image

Installation

conda create -n vlmpc python=3.10
conda activate vlmpc

pip install -r requirements.txt

Note: Add your Openai API key to the vlmpc.py.

Quickstart

We provide the trained checkpoints of video prediction model and detector, download them for quick start:

python main.py --checkpoint_file path/to/video_prediction_model/checkpoint --task push_corner --zoom 0.03 --num_samples 50 --plan_freq 3 --det_path path/to/detector/checkpoint

Citation

@inproceedings{zhao2024vlmpc,
    title={VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation},
    author={Zhao, Wentao and Chen, Jiaming and Meng, Ziyu and Mao, Donghui and Song, Ran and Zhang, Wei},
    booktitle={Robotics: Science and Systems},
    year={2024},
    }

Acknowledgements

  • Environment is based on Language-Table.
  • The implementation of DMVFN-Act video prediction model is based on DMVFN.
  • PySOT for lightweight visual tracking.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published