This is the official implementation of our ICCV 2023 paper "MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos".
MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos
Fengrui Tian, Shaoyi Du, Yueqi Duan
in ICCV 2023
In this paper, we target at the problem of learning a generalizable dynamic radiance field from monocular videos. Different from most existing NeRF methods that are based on multiple views, monocular videos only contain one view at each timestamp, thereby suffering from ambiguity along the view direction in estimating point features and scene flows. Previous studies such as DynNeRF disambiguate point features by positional encoding, which is not transferable and severely limits the generalization ability. As a result, these methods have to train one independent model for each scene and suffer from heavy computational costs when applying to increasing monocular videos in real-world applications. To address this, We propose MonoNeRF to simultaneously learn point features and scene flows with point trajectory and feature correspondence constraints across frames. More specifically, we learn an implicit velocity field to estimate point trajectory from temporal features with Neural ODE, which is followed by a flow-based feature aggregation module to obtain spatial features along the point trajectory. We jointly optimize temporal and spatial features in an end-to-end manner. Experiments show that our MonoNeRF is able to learn from multiple scenes and support new applications such as scene editing, unseen frame synthesis, and fast novel scene adaptation.
The code is tested with
- Ubuntu 16.04
- Anaconda 3
- Python 3.8.12
- CUDA 11.1
- A100 or 3090 GPUs
To get started, please create the conda environment mononerf
by running
conda create --name mononerf python=3.8
conda activate mononerf
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 -f
pip install imageio==2.19.2 pyhocon==0.3.60 pyparsing==2.4.7 configargparse==1.5.3 tensorboard==2.13.0 ipdb==0.13.13 imgviz==1.7.2 imageio--ffmpeg==0.4.8
pip install mmcv-full==1.7.1
Then install MMAction2 v0.24.1 manually.
git clone
cd mmaction2
git checkout v0.24.1
pip install -v -e .
# "-v" means verbose, or more output
# "-e" means installing a project in editable mode,
# thus any local modifications made to the code will take effect without re-installation.
Install the torchdiffeq if you want to use Neural ODE for calculating trajectories.
pip install torchdiffeq==0.0.1
Install other dependencies.
pip install tqdm Pillow==9.1.1
Finally, clone the MonoNeRF project:
git clone
cd MonoNeRF
The Dynamic Scene Dataset is used to quantitatively evaluate our method. Please refer to the official dataset to download the data. Here we present the data link from DynamicNeRF to download the training dataset.
wget --no-check-certificate
We also provide the dataset link on Google Drive that contains both training and evaluation data and evaluation code on train/
Download the SlowOnly pretrained model from MMAction2 website.
mkdir checkpoints
wget -P checkpoints/
All the training procedure is conducted on GPU 0 by default.
You can train a model from scratch by running:
chmod +x
./ 0
Train model for rendering novel views on unseen frames:
chmod +x
./ 0
Test the generalization ability on unseen scenes:
chmod +x
./ 0 2000
Train a model on your sequence (from DynNeRF)
- Set some paths
and install COLMAP manually. Then download MiDaS and RAFT weights
wget --no-check-certificate
- Prepare training images and background masks from a video.
cd $ROOT_PATH/train/utils
python --videopath /path/to/the/video
- Use COLMAP to obtain camera poses.
colmap feature_extractor \
--database_path $DATASET_PATH/database.db \
--image_path $DATASET_PATH/images_colmap \
--ImageReader.mask_path $DATASET_PATH/background_mask \
--ImageReader.single_camera 1
colmap exhaustive_matcher \
--database_path $DATASET_PATH/database.db
mkdir $DATASET_PATH/sparse
colmap mapper \
--database_path $DATASET_PATH/database.db \
--image_path $DATASET_PATH/images_colmap \
--output_path $DATASET_PATH/sparse \
--Mapper.num_threads 16 \
--Mapper.init_min_tri_angle 4 \
--Mapper.multiple_models 0 \
--Mapper.extract_colors 0
- Save camera poses into the format that NeRF reads.
cd $ROOT_PATH/train/utils
python --dataset_path $DATASET_PATH
- Estimate monocular depth.
cd $ROOT_PATH/train/utils
python --dataset_path $DATASET_PATH --model $ROOT_PATH/weights/
- Predict optical flows.
cd $ROOT_PATH/train/utils
python --dataset_path $DATASET_PATH --model $ROOT_PATH/weights/raft-things.pth
- Obtain motion mask (code adapted from NSFF).
cd $ROOT_PATH/train/utils
python --dataset_path $DATASET_PATH
- Train a model. Please change
chmod +x
./ 0
This work is licensed under MIT License. See LICENSE for details.
If you find this code useful for your research, please consider citing the following paper:
author = {Tian, Fengrui and Du, Shaoyi and Duan, Yueqi},
title = {{MonoNeRF}: Learning a Generalizable Dynamic Radiance Field from Monocular Videos},
booktitle = {Proceedings of the International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023}
Our code is built upon NeRF, NeRF-pytorch, NSFF, DynamicNeRF, pixelNeRF, and Occupancy Flow. Our flow prediction code is modified from RAFT. Our depth prediction code is modified from MiDaS.
If you have any questions, please feel free to contact Fengrui Tian.