Wentao Zhao*, Jiaming Chen*, Ziyu Meng, Donghui Mao, Ran Song†, Wei Zhang
Shandong University
This is the official repo for our Paper: VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation, which is accepted by RSS2024.
We provide the implementation of VLMPC in Language-Table environment.
conda create -n vlmpc python=3.10
conda activate vlmpc
pip install -r requirements.txt
Note: Add your Openai API key to the vlmpc.py.
We provide the trained checkpoints of video prediction model and detector, download them for quick start:
python main.py --checkpoint_file path/to/video_prediction_model/checkpoint --task push_corner --zoom 0.03 --num_samples 50 --plan_freq 3 --det_path path/to/detector/checkpoint
@inproceedings{zhao2024vlmpc,
title={VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation},
author={Zhao, Wentao and Chen, Jiaming and Meng, Ziyu and Mao, Donghui and Song, Ran and Zhang, Wei},
booktitle={Robotics: Science and Systems},
year={2024},
}
- Environment is based on Language-Table.
- The implementation of DMVFN-Act video prediction model is based on DMVFN.
- PySOT for lightweight visual tracking.