Updates:
- January 2025: Bugfixes on logging during rollouts and simulated event dataset generation procedure.
- January 2025: Simulation installation and running instructions are corrected and verified for an Ubuntu 20 and ROS Noetic machine. Happy flying!
- December 2024: Open-source code released for simulation rollouts and data collection, real data collection, events+depth calibration, training models, and real deployment. Note that file paths and other parameters in various scripts may have to be manually adjusted by the user, and code is provided as-is.
Project page  Paper  Video
We achieve sim-to-real transfer of event-vision robot control policies by leveraging depth prediction as a pretext task that be tuned in a post-training phase. We demonstrate flight indoor & outdoor, in dark and lighted conditions, at speeds of 1-5m/s, with two different event cameras.
- Introduction
- Simulator installation
- Dataset generation (simulation)
- Dataset generation (real-world)
- Train
- Real-world deployment
- Citation
- Acknowledgements
This is the official repository for the paper "Monocular Event-Based Vision for Obstacle Avoidance with a Quadrotor" by Bhattacharya, et al. (2024) from GRASP, Penn and RPG, UZH. Please note that this repository contains research-grade code and a typical user may come across some legacy and messy code.
We present the first static-obstacle avoidance method for quadrotors using just an onboard, monocular event camera. Quadrotors are capable of fast and agile flight in cluttered environments when piloted manually, but vision-based autonomous flight in unknown environments is difficult in part due to the sensor limitations of traditional onboard cameras. Event cameras, however, promise nearly zero motion blur and high dynamic range, but produce a large volume of events under significant ego-motion and further lack a continuous-time sensor model in simulation, making direct sim-to-real transfer not possible.
By leveraging depth prediction as a pretext task in our learning framework, we can pre-train a reactive obstacle avoidance events-to-control policy with approximated, simulated events and then fine-tune the perception component with limited events-and-depth real-world data to achieve obstacle avoidance in indoor and outdoor settings.
We demonstrate this across two quadrotor-event camera platforms in multiple settings and find, contrary to traditional vision-based works, that low speeds (1m/s) make the task harder and more prone to collisions, while high speeds (5m/s) result in better event-based depth estimation and avoidance. We also find that success rates in outdoor scenes can be significantly higher than in certain indoor scenes.
Note that if you'd only like to train models, and not run the Flightmare simulation to test models in simulation, you can simply clone this repository and skip straight to section Train.
If you'd like to start a new catkin workspace, then a typical workflow is (note that this code has only been tested with ROS Noetic and Ubuntu 20.04):
cd
mkdir -p evfly_ws/src
cd evfly_ws
catkin init
catkin config --extend /opt/ros/$ROS_DISTRO
catkin config --merge-devel
catkin config --cmake-args -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=-fdiagnostics-color
Once inside your desired workspace, clone this repository:
cd ~/evfly_ws/src
git clone [email protected]:anish-bhattacharya/evfly
cd evfly
In order to replicate the Unity obstacle environment used for training and (sim) testing, you will need to download simulation/custom_sim_forest_environment.tar
(5.5MB) from Datashare and extract it to the right location (below). This contains 100 tree obstacle configurations.
tar -xvf <path/to/custom_sim_forest_environment.tar> -C flightmare/flightpy/configs/vision
You will also need to download our Unity resources and binaries. Download simulation/flightrender.tar
(443MB) from Datashare and then:
tar -xvf <path/to/flightrender.tar> -C flightmare/flightrender --strip-components=1
Then, build the workspace:
bash setup_ros.bash
cd ../..
catkin build
source devel/setup.bash
cd src/evfly
Note that you may have to install additional necessary packages, such as rqt_gui, opencv, libzmqpp-dev, and others. Also note you may include a CATKIN_IGNORE
in the ROS packages evfly_ros
and evfly_dv_ros
so that you don't need to install event camera dependencies while just doing simulation or training work.
Please add the following lines to your .bashrc so you may run the simulation from any open shell:
export FLIGHTMARE_PATH=/path/to/evfly/flightmare
source ~/evfly_ws/devel/setup.bash
As we describe in the paper, testing the model in real-time simulation necessitates estimating eventstreams via difference-of-log-images, which is out-of-distribution for the Vid2E-trained models. Therefore, do not expect high success rates in simulation test runs.
You'll likely need to install the following python packages if you haven't already:
tensorboard #if you don't set DEPLOYMENT = True in learner/learner.py
h5py
opencv
torch
torchvision
tqdm
imageio
configargparse
matplotlib
Download pretrained_models.tar
(50MB) from Datashare. This tarball includes a number of simulation-pre-trained and real-fine-tuned models. For simulation testing in the provided forest environment, we will use the jointly-trained sim_forest_DthetaVphi.pth
.
tar -xvf <path/to/pretrained_models.tar>
If you are just interested in the same forest environment we use in the paper, you should not need to edit the config file.
Relevant notes for the flightmare/flightpy/configs/vision/config.yaml
file:
level
specifies the environment type (we only providesim_forest_environment
) andenv_folder
specifies the environment index (environment_<0-99>
).datagen: 0
androllout: 1
runs trials in sequential environment indices starting from the givenenv_folder
, and uses thecustom_
environment formulation we provide be pre-pendingcustom_
to the specifiedlevel
. This custom environment formulation describes static obstacles as dynamic ones so they can have updated positions every time the Unity environment is reset.unity: scene: 2
specifies the warehouse environment with grassy ground texture and improved lighting on the trees. Other available scenes and their drone starting positions are found inflightmare/flightpy/configs/scene.yaml
.
When running the simulation (the following section), you can set any number N
of trials to run. To run trials on the same, specified environment index, set datagen: 1
and rollout: 0
.
The launch_evaluation.bash
script launches Flightmare and the trained model for event-based flight when using vision
mode. To run trials:
bash launch_evaluation.bash <N> vision
Some details: If you look at the bash script, you'll see multiple python scripts being run. envtest/ros/run_competition.py
subscribes to input images, computes approximated event batches, and passes them to the model to get desired velocity commands. It also selects the config file from which to load model parameters, currently set to learner/configs/eval_config_sim_joint.txt
which utilizes the jointly-trained envtest/ros/evaluation_node.py
counts crashes, starts and aborts trials, and prints other statistics to the console. The topic /debug_img1
streams images from the quadrotor grayscale camera and overlays approximate events on it in red-blue color. /debug_img2
shows the model predicted depth reconstruction with an overlaid arrow indicating the model's output velocity command.
To create a dataset of events, depth images, and desired velocity commands from execution of the privileged expert in simulation, we must first collect data from expert rollouts, then convert collected images into approximated events, and finally package the data into h5 file formats that our training script can read.
Ensure that the flightmare/flightpy/configs/vision/config.yaml
parameters are unchanged. Default evaluation parameters specified in envtest/ros/evaluation_config.yaml
resets the environment once the drone reaches the target x=60
, the drone exits a specified bounding box, or the trial times out.
Run Flightmare in state
mode to activate the privileged expert policy. Data will be stored upon each rollout in envtest/ros/rollouts
(a folder created in the setup script):
bash launch_evaluation.bash <N> state
After populating the rollouts
directory, you can move the relevant trajectory folders to your desired dataset directory. If copying all folders, make sure no previous, unwanted rollouts are present in the directory.
mv envtest/ros/rollouts/* data/datasets/new_dataset/
Install Vid2E then run the following script to approximate events from the collected images. Please note that the --cpu
flag is legacy and not supported as of now. A numpy object array evs_frames.npy
is created that contains a sequence of event frames for each trajectory in the dataset. Each event frame is a float array for all events within the time window between consecutive image frames, and contains the sum of positive and negative events at a given pixel location multiplied by the corresponding contrast thresholds (defaults values are +/-0.2).
python utils/to_events.py --dataset data/datasets/new_dataset --output_path data/datasets/new_dataset --acc_scheme time --keep_collisions
NOTE: If you struggle to install dependencies and run the above, I have included a sample conda vid2e_environment.yml
that should include the packages necessary to run the above script. It is not minimal, and has many unnecessary packages, but you may try it: conda env create -f vid2e_environment.yml
.
Our dataloader can read data from the format of trajectory folders+events npy file, but dataloading is faster when data is packaged into h5 files. To do this, run the following script to create a new_dataset.h5
file that contains the events, depth images, and telemetry data for each trajectory in the dataset.
python utils/to_h5.py /absolute/path/to/new_dataset convert False
You can confirm the contents of the h5 file by running:
python utils/to_h5.py /absolute/path/to/new_dataset view False
The post-training phase that fine-tunes the
In our work, we use a Intel Realsense D435 depth camera with either a Prophesee Gen3 or DAVIS 346 event camera. The D435 depth images are pre-aligned with the infrared1 camera feed, which we can therefore use for calibration. In order to calibrate to the event camera, we use the e2calib procedure, including conversion of events to approximated grayscale images. Note that we utilize ROS1, e2vid, and Kalibr for the calibration process.
Record a chessboard sequence via rosbag record
with the D435 and the event camera, recording the infrared1 feed and events. We recommend holding the dual-camera setup still and moving the chessboard in front of it. Due to the events->images step later, you must start recording data with the board fully covering the event camera field of view then slowly back up and move the board around for collection.
- D435: Ensure that you are recording the D435 infrared images by setting the argument
enable_infra
to true in its launch file. When you use librealsense and ROS1 to run the camera you should see that/camera/infra1/image_rect_raw
is aligned to/camera/depth/image_rect_raw
. - Event camera: Make sure that your event camera is focused properly and also publishing an eventstream. In our experience, the DAVIS ROS1 packages dv-ros or rpg_dvs_ros or Prophesee Metavision ROS1 package by Bernd Pfrommer were fast enough to perform approximate time-synchronization with the D435. With the Prophesee-provided package we observed a small lag making it unusable for calibration. Note that if using a non-DVS rosmsg type, you might have to convert your event camera data in the rosbag to use the DVS message format first using this script:
data_gather/rosbag_metavision-ros-driver_to_DVSmsgs.py
.
Next, we need to extract the infrared image timestamps, convert the events data to h5 format, run image reconstruction from the events, and finally calibrate the two image feeds. We provide the script data_gather/process_calib.sh
to run these steps. Adjust the paths in the script as necessary. You will need to install these notable dependencies:
- e2calib, install via the steps under e2calib: Image Reconstruction;
- Kalibr multi-camera calibration.
Then you can run our script:
bash data_gather/process_calib.sh /path/to/calibration/rosbag <optional_GPU_ID>
At the end of the script, the kalibr docker container will be launched, and you should run the following commands. This may take some time. If all goes well, calibration will be successful and txt, yaml, and pdf files will be generated with both cameras' parameters.
source devel/setup.bash
rosrun kalibr kalibr_calibrate_cameras \
--bag /data/rosbag.bag --target /data/chessboard.yaml \
--models pinhole-radtan pinhole-radtan \
--topics /cam0/image_raw /cam1/image_raw
Now we may gather depth and events to fine-tune the data_gather
is a ROS1 package useful for data collection, which must be built:
cd /path/to/evfly_ws
catkin build data_gather
source devel/setup.bash
Launch the following to run nodes for an event camera and a D435 camera:
roslaunch data_gather events_and_d435.launch
Once both cameras are running, start the bagging script to collect the necessary data:
bash data_gather/bag_evs_d435.sh
We recommend gathering around a minute of handheld data with a variety of motions (rolling/pitching/forward/backward/left/right/fast/slow).
Depending on the size of your rosbag file, you may want to splice the rosbag into smaller parts, which can then be treated as trajectories during the training process. We provide the following script to create smaller rosbags of length 10s each:
python data_gather/splice_rosbag.py /path/to/large/rosbag.bag /path/to/spliced/rosbags <rosbag length in seconds>
Next, run this bash script to extract data from each rosbag in a directory:
bash data_gather/data_from_rosbags.sh /path/to/spliced/rosbags data/datasets/new_dataset
Finally, run this script to run the necessary post-processing steps to align the events and depth images, and package the data into an h5 file. As in some previous scripts, you will need to adjust paths as necessary.
bash data_gather/prep-dataset.sh new_dataset
Now you should be able to use this h5 dataset file for training.
We provide datasets for the simulated forest environment and the real forest environment presented in the paper. The corresponding h5 dataset files can be downloaded from the Datashare in the folder datasets
. Please read the README for details on the provided datasets and the data format.
We provide a script learner.py
that instantiates a Learner class, which can be used for loading models and datasets for purposes like model evaluation and data analysis, as well as training a new or pre-trained model. Training configuration is specified in the arguments in learner/configs/config.txt
. Relevant files for debugging, as well as model checkpoints and tensorboard files are saved in the learner/logs
directory. Note that we have only confirmed training functionality with a Nvidia GPU. To train:
python learner/learner.py --config learner/configs/config.txt
You can monitor training and validation statistics with Tensorboard:
tensorboard --logdir learner/logs
Please refer to the code to understand what each argument in the config file does. Some argument descriptions are in the later section "Useful config details for training or testing".
You can evaluate a model's performance for both depth prediction and velocity command prediction using the provided script learner/evaluation_tools.py
. This script generates plots to compare ground truth vs. predicted velocity commands (valid for simulation only, where we have expert velocity commands) and a gif visualization of input events, predicted depth and velocity commands, and ground truth. These plots or visualizations are saved in the evaluation's workspace, saved in the learner/logs
directory.
Sample snapshot from the gif visualization:
We provide sample commands with config files that assume pretrained models have been placed in the evfly/data
directory.
For testing the joint
python learner/evaluation_tools.py --config learner/configs/eval_config_joint.txt
For testing the real forest deployment model that uses a ViT-LSTM model from another work for
python learner/evaluation_tools.py --config learner/configs/eval_config_origunet_w_vitlstm.txt
- You can train on multiple datasets by listing them in the dataset argument:
dataset = [real_forest-a, real_forest-b, real_forest-c, real_forest-d, real_forest-e]
. evs_min_cutoff
specifies the bottom percentage threshold of events to cut out from event frames prior to feeding to the model. This helps cut out some of the high-texture events in real world trials, such as grass texture. For simulation datasets we keep this at 0 but for real datasets we typically set it to 0.15 (15%). You should not have to tune this parameter.short=<N>
can be used to shorten the dataset size, typically for debugging purposes. Note that if you use small values likeshort=5
then theval_split=0.15
might lead to fewer than 2 validation trajectories, leading to errors in evaluation.- To train or test a model from a given checkpoint, specify the path in the
checkpoint_path
argument in the config file. - If you are training a model from a given checkpoint on the same dataset, and want to preserve the same train/val split, set
load_trainval = True
. - During training, the argument
loss_weights = [10.0, 1.0]
specifies that the first loss parameter (velocity supervision MSE loss) is up-weighted by a factor of 10.0 and the second loss parameter (depth supervision MSE loss) is weighted by 1.0. - During training, the argument
optional_loss_param = [5.0, -1.0]
specifically up-weights velocity prediction loss further by 5.0 where ground truth is executing a dodging maneuver (y- or z- command nonzero) and scales the depth prediction loss by the inverse depth values to up-weight the loss for closer objects. - Note that the evaluation gif for a training example (with default parameters, one training and two validation gifs are generated) might exhibit data augmentation artifacts, like being left-right flipped.
We provide a ROS1 package evfly_ros
for running the trained models on an incoming event camera stream. This package does not provide an autopilot or flight controller, and is meant to be used in conjunction with a broader autonomy architecture (see KumarRobotics Autonomous Flight, RPG Agilicious, PX4, Betaflight). To build the package, catkin build evfly_ros
from the evfly_ws location (ensure you've removed any CATKIN_IGNORE
file in the package). In the run.py
script, you should modify the paths EVFLY_PATH
and CALIB_FILE
. Additionally, you need to modify the ROS topic names in the subscribers and publishers as appropriate. On a 12-core, 16GB RAM, cpu-only machine (similar to that used for hardware experiments presented in the paper) the model OrigUNet_w_VITFLY_ViTLSTM
runs a forward pass in approximately 73ms.
Run your event camera driver (Prophesee), our C++ events processing node, and the model inference node with:
roslaunch evfly_ros evs_proc.launch
For DVS/DAVIS event cameras you may instead use:
roslaunch evfly_dv_ros dvs.launch
Model predicted depth and velocities, as well as the input events packaged into a frame, are published on the /output
topics.
We include a /trigger
signal that, when continuously published to, will route the predicted velocity commands to a given topic /robot/cmd_vel
. We do this with the following terminal command typically sent from a basestation computer. If you Ctl+C this command, the rosnode will send velocity 0.0 commands to stop in place. This is a safety feature.
rostopic pub -r 50 /trigger std_msgs/Empty "{}"
Some details:
- Keep in mind the model is trained to continuously fly forward at the desired velocity and requires either manual pilot takeover or reaching the positional guard limit to stop.
- z-velocity commands are currently set to maintain a given desired height set by
self.des_z
, and the p-gain used to do this may need slight tuning (line 306). - We use a ramp-up in the
run()
function to smoothly accelerate the drone to the desired velocity overself.ramp_duration
seconds. - If you are using an event camera with resolution different than 260x346, then depending on the lens used for the event camera in relation to the depth camera intrinsics, you may want to either center-crop or just resize the batch of events to 260x346. This option is on lines 345-351.
- Velocity commands are published with respect to x-direction forward, y-direction left, and z-direction up.
Please fly safely!
@inproceedings{bhattacharya2024monocular,
title={Monocular Event-Based Vision for Obstacle Avoidance with a Quadrotor},
author={Bhattacharya, Anish and Cannici, Marco and Rao, Nishanth and Tao, Yuezhan and Kumar, Vijay and Matni, Nikolai and Scaramuzza, Davide},
booktitle={8th Annual Conference on Robot Learning}
}
If you use the privileged expert, pretrained sim_vitfly-vitlstm_Vphi.pth
, or code in learner/vitfly_models.py
for your work, please additionally cite the original paper:
@article{bhattacharya2024vision,
title={Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle Avoidance},
author={Bhattacharya, Anish and Rao, Nishanth and Parikh, Dhruv and Kunapuli, Pratik and Wu, Yuwei and Tao, Yuezhan and Matni, Nikolai and Kumar, Vijay},
journal={arXiv preprint arXiv:2405.10391},
year={2024}
}
Some parts of the training or testing scripts, the privileged expert policy, the ViT-LSTM model for velocity command prediction, and other code snippets may be adapted from the project Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle Avoidance.
Simulation launching code and the versions of flightmare
and dodgedrone_simulation
are from the ICRA 2022 DodgeDrone Competition code.