Despite considerable progress in stereo depth estimation, omnidirectional imaging remains underexplored, mainly due to the lack of appropriate data. We introduce Helvipad, a real-world dataset for omnidirectional stereo depth estimation, consisting of 40K frames from video sequences across diverse environments, including crowded indoor and outdoor scenes with diverse lighting conditions. Collected using two 360Β° cameras in a top-bottom setup and a LiDAR sensor, the dataset includes accurate depth and disparity labels by projecting 3D point clouds onto equirectangular images. Additionally, we provide an augmented training set with a significantly increased label density by using depth completion. We benchmark leading stereo depth estimation models for both standard and omnidirectional images. The results show that while recent stereo methods perform decently, a significant challenge persists in accurately estimating depth in omnidirectional imaging. To address this, we introduce necessary adaptations to stereo models, achieving improved performance.
- [14 Apr 2025] We have released the code of our 360-IGEV-Stereo model which adapts a standard stereo matching architecture to omnidirectional imagery.
- [08 Apr 2025] Our new paper DFI-OmniStereo achieves state-of-the-art results on Helvipad. Check out the project page for details, paper and code.
- [16 Mar 2025 - CVPR Update] A small but important update has been applied to the dataset. If you have already downloaded it, please check the details on the HuggingFace Hub.
- [16 Feb 2025] Helvipad has been accepted to CVPR 2025! ππ
The dataset is organized into training, validation and testing subsets with the following structure:
helvipad/
βββ train/
β βββ depth_maps # Depth maps generated from LiDAR data
β βββ depth_maps_augmented # Augmented depth maps using depth completion
β βββ disparity_maps # Disparity maps computed from depth maps
β βββ disparity_maps_augmented # Augmented disparity maps using depth completion
β βββ images_top # Top-camera RGB images
β βββ images_bottom # Bottom-camera RGB images
β βββ LiDAR_pcd # Original LiDAR point cloud data
βββ val/
β βββ depth_maps # Depth maps generated from LiDAR data
β βββ depth_maps_augmented # Augmented depth maps using depth completion
β βββ disparity_maps # Disparity maps computed from depth maps
β βββ disparity_maps_augmented # Augmented disparity maps using depth completion
β βββ images_top # Top-camera RGB images
β βββ images_bottom # Bottom-camera RGB images
β βββ LiDAR_pcd # Original LiDAR point cloud data
βββ test/
β βββ depth_maps # Depth maps generated from LiDAR data
β βββ depth_maps_augmented # Augmented depth maps using depth completion (only for computing LRCE)
β βββ disparity_maps # Disparity maps computed from depth maps
β βββ disparity_maps_augmented # Augmented disparity maps using depth completion (only for computing LRCE)
β βββ images_top # Top-camera RGB images
β βββ images_bottom # Bottom-camera RGB images
β βββ LiDAR_pcd # Original LiDAR point cloud data
We evaluate the performance of multiple state-of-the-art and popular stereo matching methods, both for standard and 360Β° images. All models are trained on a single NVIDIA A100 GPU with the largest possible batch size to ensure comparable use of computational resources.
Method | Stereo Setting | Disp-MAE (Β°) | Disp-RMSE (Β°) | Disp-MARE | Depth-MAE (m) | Depth-RMSE (m) | Depth-MARE | Depth-LRCE (m) |
---|---|---|---|---|---|---|---|---|
PSMNet | conventional | 0.286 | 0.496 | 0.248 | 2.509 | 5.673 | 0.176 | 1.809 |
360SD-Net | omnidirectional | 0.224 | 0.419 | 0.191 | 2.122 | 5.077 | 0.152 | 0.904 |
IGEV-Stereo | conventional | 0.225 | 0.423 | 0.172 | 1.860 | 4.447 | 0.146 | 1.203 |
360-IGEV-Stereo | omnidirectional | 0.188 | 0.404 | 0.146 | 1.720 | 4.297 | 0.130 | 0.388 |
DFI-OmniStereo | omnidirectional | 0.158 | 0.338 | 0.120 | 1.463 | 3.767 | 0.108 | 0.397 |
The dataset is available on HuggingFace Hub.
The code of the method 360-IGEV-Stereo can be found in the 360_igev_stereo
directory. We use Hydra for configuration management and Weights & Biases for comprehensive experiment tracking and visualization.
We assume that the Helvipad dataset has been downloaded and is stored at the location ./data/helvipad
.
conda create -n 360-igev-stereo python=3.11
conda activate 360-igev-stereo
git clone [email protected]:vita-epfl/Helvipad.git
cd Helvipad/360_igev_stereo
pip install -r requirements.txt
IGEV-Stereo (SceneFlow weights): Create the directory and download the pretrained SceneFlow weights from the IGEV-Stereo Google Drive, as provided by IGEV-Stereo:
mkdir -p ./models/_360_igev_stereo/pretrained_models/igev_stereo
Place the downloaded file into the directory created above.
360-IGEV-Stereo main checkpoint: Download our pretrained model checkpoint:
mkdir -p ./models/_360_igev_stereo/pretrained_models/360_igev_stereo && \
wget -O ./models/_360_igev_stereo/pretrained_models/360_igev_stereo/360_igev_stereo_helvipad.pth "https://github.com/vita-epfl/Helvipad/releases/download/v1.0.0/360_igev_stereo_helvipad.pth"
To train the model from the IGEV-Stereo weights, run the following command:
cd 360_igev_stereo
python train.py \
--debug=false \
--exp_name=Main \
--dataset_root=./data/helvipad/
All other parameters are set to their default values for training the main model.
To evaluate our model using the main checkpoint and compute all metrics including Left-Right Consistency Error (LRCE), use:
cd src
python evaluate.py \
--debug=false \
--exp_name=Evaluation \
--dataset_root=./data/helvipad/ \
--restore_ckpt=./models/_360_igev_stereo/pretrained_models/360_igev_stereo/360_igev_stereo_helvipad.pth \
--calc_lrce=true
Note: Setting --calc_lrce=true
enables LRCE evaluation, which increases computation time.
To generate inference results on selected samples from the Helvipad dataset, run the following command:
cd src
python infer.py \
--infer_name=helvipad_results \
--dataset_root=./data/helvipad/ \
--restore_ckpt=./models/_360_igev_stereo/pretrained_models/360_igev_stereo/360_igev_stereo_helvipad.pth \
--images test-20240120_REC_06_IN-0042 test-20240124_REC_03_OUT-0676 test-20240124_REC_08_NOUT-0717
This command will process the following frames (all of which are part of the test
set):
0042
from the scene20240120_REC_06_IN
0676
from the scene20240124_REC_03_OUT
0717
from the scene20240124_REC_08_NOUT
The results as well as the top and bottom images will be saved to: ./models/_360_igev_stereo/inference_results/helvipad_results
.
To evaluate our model on real-world examples from the 360SD-Net dataset:
- Download the real-world top and bottom images from the official repo.
- Place the data in a directory of your choice, e.g.,
./data/360sd
. - Run the following command to perform inference:
cd src
python infer.py \
--infer_name=360SD_results \
--dataset_root=./data/360sd/ \
--restore_ckpt=./models/_360_igev_stereo/pretrained_models/360_igev_stereo/360_igev_stereo_helvipad.pth \
--dataset=360SD \
--min_disp_deg=0.0048 \
--max_disp_deg=178 \
--max_disp=512 \
--images hall room stairs
This will run inference on the following scenes:
hall
room
stairs
The results will be saved in: ./models/_360_igev_stereo/inference_results/360SD_results
.
For more information, visualizations, and updates, visit the project page.
This dataset is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
This work was supported by the EPFL Center for Imaging through a Collaborative Imaging Grant. We thank the VITA lab members for their valuable feedback, which helped to enhance the quality of this manuscript. We also express our gratitude to Dr. Simone Schaub-Meyer and Oliver Hahn for their insightful advice during the project's final stages.
If you use the Helvipad dataset in your research, please cite our paper:
@inproceedings{zayene2025helvipad,
author = {Zayene, Mehdi and Endres, Jannik and Havolli, Albias and Corbière, Charles and Cherkaoui, Salim and Ben Ahmed Kontouli, Alexandre and Alahi, Alexandre},
title = {Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2025}
}