Revive driving scene understanding by delving into the embodiment philosophy
Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, and Hongyang Li
- Presented by OpenDriveLab and Shanghai AI Lab
- 📬 Primary contact: Yunsong Zhou ( [email protected] )
- arXiv paper | Blog TODO | Slides
- CVPR 2024 Autonomous Driving Challenge - Driving with Language
🔥 The first embodied language model for understanding the long-horizon driving scenarios in space
and time
.
🌟 ELM expands a wide spectrum of new tasks to fully leverage the capability of large language models in an embodiment setting and achieves significant improvements in various applications.
🏆 Interpretable driving model, on the basis of language prompting, will be a main track in the CVPR 2024 Autonomous Driving Challenge
. Please stay tuned for further details!
- 🔥 Interpretable driving model is launched. Please refer to the link for more details.
[2024/03]
ELM paper released.[2024/03]
ELM code and data initially released.
- Highlights
- News
- TODO List
- Installation
- Dataset
- Training and Inference
- License and Citation
- Related Resources
- Release fine-tuning code and data
- Release reference checkpoints
- Toolkit for label generation
- (Optional) Creating conda environment
conda create -n elm python=3.8
conda activate elm
- install from PyPI
pip install salesforce-lavis
- Or, for development, you may build from source
git clone https://github.com/OpenDriveLab/ELM.git
cd ELM
pip install -e .
Pre-training data. We collect driving videos from YouTube, nuScenes, Waymo, and Ego4D. Here we provide a sample of 🔗 YouTube video list we used. For privacy considerations, we are temporarily keeping the full-set data labels private. Part of pre-training data and reference checkpoints can be found in 💾 google drive.
Fine-tuning data.
The full set of question and answer pairs for the benchmark can be obtained through this 🔗data link. You may need to download the corresponding image data from the official nuScenes and Ego4D channels.
For a quick verification
of the pipeline, we recommend downloading the subset dataset of DriveLM and organizing the data in line with the format.
Please make sure to soft link nuScenes
and ego4d
datasets under data/xx
folder.
You may need to run tools/video_clip_processor.py
to pre-process data first.
Besides, we provide some script used during auto-labeling, you may use these as a reference if you want to customize data.
# you can modify the lavis/projects/blip2/train/advqa_t5_elm.yaml
bash scripts/train.sh
Modify the advqa_t5_elm.yaml to enable the evaluate as True.
bash scripts/train.sh
For the evaluation of generated answers, please use the script in scripts/qa_eval.py
.
python scripts/qa_eval.py <data_root> <log_name>
All assets and code in this repository are under the Apache 2.0 license unless specified otherwise. The language data is under CC BY-NC-SA 4.0. Other datasets (including nuScenes and Ego4D) inherit their own distribution licenses. Please consider citing our paper and project if they help your research.
@article{zhou2024embodied,
title={Embodied Understanding of Driving Scenarios},
author={Zhou, Yunsong and Huang, Linyan and Bu, Qingwen and Zeng, Jia and Li, Tianyu and Qiu, Hang and Zhu, Hongzi and Guo, Minyi and Qiao, Yu and Li, Hongyang},
journal={arXiv preprint arXiv:2403.04593},
year={2024}
}
We acknowledge all the open-source contributors for the following projects to make this work possible: