BodySLAM is a cutting-edge, deep learning-based Simultaneous Localization and Mapping (SLAM) framework designed specifically for endoscopic surgical applications. By leveraging advanced AI techniques, BodySLAM brings enhanced depth perception and 3D reconstruction capabilities to various surgical settings, including laparoscopy, gastroscopy, and colonoscopy.
Our comprehensive paper detailing the BodySLAM framework is now available on arXiv:
BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications
G. Manni, C. Lauretti, F. Prata, R. Papalia, L. Zollo, P. Soda
If you find our work useful in your research, please consider citing:
@misc{manni2024bodyslamgeneralizedmonocularvisual,
title={BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications},
author={G. Manni and C. Lauretti and F. Prata and R. Papalia and L. Zollo and P. Soda},
year={2024},
eprint={2408.03078},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.03078},
}
If you use the depth estimation module in your research, please also cite:
@misc{https://doi.org/10.48550/arxiv.2302.12288,
doi = {10.48550/ARXIV.2302.12288},
url = {https://arxiv.org/abs/2302.12288},
author = {Bhat, Shariq Farooq and Birkl, Reiner and Wofk, Diana and Wonka, Peter and Müller, Matthias},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth},
publisher = {arXiv},
year = {2023},
copyright = {arXiv.org perpetual, non-exclusive license}
}
In the challenging world of endoscopic surgeries, where hardware limitations and environmental variations pose significant obstacles, BodySLAM stands out by integrating deep learning models with strong generalization capabilities. Our framework consists of three key modules:
- Monocular Pose Estimation Module (MPEM): Estimates relative camera poses between consecutive frames using our novel CyclePose architecture
- Monocular Depth Estimation Module (MDEM): Predicts depth maps from single images using the Zoe model
- 3D Reconstruction Module (3DM): Combines pose and depth information for 3D scene reconstruction
- State-of-the-Art Depth Estimation: Utilizes the Zoe model for accurate monocular depth estimation
- Novel Pose Estimation: Implements CycleVO, a novel developed unsupervised method for pose estimation
- Cross-Setting Performance: Robust functionality across various endoscopic surgical environments
We're actively refactoring our codebase to enhance usability and performance. Here's our current progress:
- Monocular Depth Estimation Module (MDEM)
- Monocular Pose Estimation Module (MPEM)
- 3D Reconstruction Module (3DM)
- Integration and Testing
We've included several examples to help you get started with BodySLAM:
-
Basic Depth Estimation: Demonstrates the fundamental pipeline for estimating depth from a single image.
python examples/depth_estimation/basic_depth_estimation.py
-
Depth Map Scaling and Colorization: Shows how to scale and colorize depth maps for better visualization.
python examples/depth_estimation/depth_map_scaling.py
-
Batch Processing: Illustrates how to process multiple images for depth estimation and colorization.
python examples/depth_estimation/batch_processing.py
-
Single Pair Processing: Estimate relative pose between two consecutive frames.
python examples/pose_estimation/run_cycle_pose.py --mode pair \ --model_path path/to/model.pth \ --input frame1.jpg \ --input2 frame2.jpg \ --output pose.txt
-
Sequence Processing: Process an entire sequence of frames.
python examples/pose_estimation/run_cycle_pose.py --mode sequence \ --model_path path/to/model.pth \ --input path/to/sequence \ --output sequence_poses.txt
-
Dataset Processing: Process multiple sequences in a dataset.
python examples/pose_estimation/run_cycle_pose.py --mode dataset \ --model_path path/to/model.pth \ --input path/to/dataset \ --output path/to/results
-
Clone the repository:
git clone https://github.com/yourusername/BodySLAM.git cd BodySLAM
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install the required modules:
pip install -r requirements.txt
BodySLAM/
├── src/
│ ├── depth_estimation/
│ │ └── interface.py
│ └── pose_estimation/
│ └── interface.py
├── examples/
│ ├── depth_estimation/
│ │ └── basic_depth_estimation.py
│ └── pose_estimation/
│ └── run_cycle_pose.py
└── tests/
- 3D Reconstruction Module: Integration of pose and depth for complete 3D reconstruction (28/01/2025)
- Pre-trained Models: Ready-to-use models for different surgical settings (29/01/2025)
- Enhanced Documentation: More detailed tutorials and API documentation
Q: Will the training dataset for CycleVO be released? A: No, the training dataset for CycleVO will not be released to the public. However, we will release the pre-trained model weights.
Q: Where can I find the Hamlyn Dataset? A: The Hamlyn Dataset can be accessed here.
Q: Where can I find the EndoSLAM Dataset? A: The EndoSLAM Dataset can be accessed here.
We welcome contributions! If you're interested in improving BodySLAM, please check our Contributing Guidelines (coming soon).
BodySLAM is released under the MIT License.
For questions or support, please open an issue on our GitHub repository.