Official Repository of Panacea.
[Paper] Panacea: Panoramic and Controllable Video Generation for Autonomous Driving,
Yuqing Wen1*†, Yucheng Zhao2*,Yingfei Liu2*, Fan Jia2, Yanhui Wang1, Chong Luo1, Chi Zhang3, Tiancai Wang2‡, Xiaoyan Sun1‡, Xiangyu Zhang2
1University of Science and Technology of China, 2MEGVII Technology, 3Mach Drive
*Equal Contribution, †This work was done during the internship at MEGVII, ‡Corresponding Author.
[Paper] Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving,
Yuqing Wen1*†, Yucheng Zhao2*,Yingfei Liu2*, Binyuan Huang4*, Fan Jia2, Yanhui Wang1, Chi Zhang3, Tiancai Wang2‡, Xiaoyan Sun1‡, Xiangyu Zhang2
1University of Science and Technology of China, 2MEGVII Technology, 3Mach Drive, 4Wuhan University
*Equal Contribution, †This work was done during the internship at MEGVII, ‡Corresponding Author.
[WebPage] https://panacea-ad.github.io/
-
Aug. 15th, 2024
: We release an enhanced version of Panacea, named Panancea+, which has improved performance and comprehensive validation on multiple datasets and tasks. For more details, please refer to the paper Panacea+. -
Aug. 15th, 2024
: We release the checkpoint and inference scripts for stage 2 of Panacea+, you can use it to generate multi-view video samples based on BEV layout sequences. -
Apr. 18th, 2024
: We release our Gen-nuScenes dataset generated by Panacea. Please check themetrics/
folder to use it. -
Apr. 18th, 2024
: We release the BEV-perception evaluation codes based on StreamPETR. Please check themetrics/
folder and follow themetrics/README.md
for detailed evaluation.
Please follow our documentation step by step.
Following the instruction from: Environment Setup.
Prepare real dataset following the instruction from Data Preparation.
Remember to put the dataset under the path data/nuscenes
Download the weights of the second stage from panaceaplus_40k_deepspeed.ckpt
Put it to folder checkpoints/
--split: to specify train or val sets
--use_last_frame=true means use the last frame as conditional image.
Run the following command to inference stage 2 on the whole training/val set of nuscenes.
python -m torch.distributed.launch --nproc_per_node=8 --master_port=1238 inference.py --base configs/inference_nuscenes.yaml --ckptpath --ckpt checkpoints/panaceaplus_40k_deepspeed.ckpt --split train --use_last_frame true --name EXP_NAME --bs 1
Overview of Panacea. (a). The diffusion training process of Panacea, enabled by a diffusion encoder and decoder with the decomposed 4D attention module. (b). The decomposed 4D attention module comprises three components: intra-view attention for spatial processing within individual views, cross-view attention to engage with adjacent views, and cross-frame attention for temporal processing. (c). Controllable module for the integration of diverse signals. The image conditions are derived from a frozen VAE encoder and combined with diffused noises. The text prompts are processed through a frozen CLIP encoder, while BEV sequences are handled via ControlNet. (d). The details of BEV layout sequences, including projected bounding boxes, object depths, road maps and camera pose.
The two-stage inference pipeline of Panacea. Its two-stage process begins by creating multi-view images with BEV layouts, followed by using these images, along with subsequent BEV layouts, to facilitate the generation of following frames.
Controllable multi-view video generation. Panacea is able to generate realistic, controllable videos with good temporal and view consistensy.
Video generation with variable attribute controls, such as weather, time, and scene, which allows Panacea to simulate a variety of rare driving scenarios, including extreme weather conditions such as rain and snow, thereby greatly enhancing the diversity of the data.
(a). Panoramic video generation based on BEV (Bird’s-Eye-View) layout sequence facilitates the establishment of a synthetic video dataset, which enhances perceptual tasks. (b). Producing panoramic videos with conditional images and BEV layouts can effectively elevate image-only datasets to video datasets, thus enabling the advancement of video-based perception techniques.
@inproceedings{wen2024panacea,
title={Panacea: Panoramic and controllable video generation for autonomous driving},
author={Wen, Yuqing and Zhao, Yucheng and Liu, Yingfei and Jia, Fan and Wang, Yanhui and Luo, Chong and Zhang, Chi and Wang, Tiancai and Sun, Xiaoyan and Zhang, Xiangyu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={6902--6912},
year={2024}
}
@misc{wen2024panaceapanoramiccontrollablevideo,
title={Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving},
author={Yuqing Wen and Yucheng Zhao and Yingfei Liu and Binyuan Huang and Fan Jia and Yanhui Wang and Chi Zhang and Tiancai Wang and Xiaoyan Sun and Xiangyu Zhang},
year={2024},
eprint={2408.07605},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.07605},
}
}
This code builds on Stability-AI, ControlNet and StreamPETR. Thanks for open-sourcing!