The goal of this project is to facilitate research in autolabeling, scene understanding and neural implicit feature fields.
readme.mp4
The installation instructions were tested for Python 3.8 and 3.9. Some dependencies are recommended to be installed through Anaconda and we assume you are using an Anaconda environment for these instructions.
The software uses CUDA and compiling tiny-cuda-nn
requires nvcc
. If you don't have cuda >= version 11.3, including nvcc
, installed on your system, you can install it in your anaconda env with:
conda install -c conda-forge cudatoolkit-dev=11.4
To install Pytorch and ffmpeg, run:
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
conda install ffmpeg
Install into your desired python environment with the following commands:
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
git clone --recursive [email protected]:cvg/Hierarchical-Localization.git
pushd Hierarchical-Localization/
python -m pip install -e .
popd
git submodule update --init --recursive
pushd torch_ngp
git submodule update --init --recursive
pip install -e .
bash scripts/install_ext.sh
popd
# To use LSeg features for vision-language feature fields
git clone https://github.com/kekeblom/lang-seg
pushd lang-seg
pip install -e .
popd
# Finally install autolabel
pip install -e .
After installing the project using the instructions above, you can follow these steps to run autolabel on an example scene.
# Download example scene
wget http://robotics.ethz.ch/~asl-datasets/2022_autolabel/bench.tar.gz
# Uncompress
tar -xvf bench.tar.gz
# Compute camera poses, scene bounds and undistort images using raw input images
python scripts/mapping.py bench
# Compute DINO features from color images.
python scripts/compute_feature_maps.py bench --features dino --autoencode
# Pretrain neural representation on color, depth and extracted features
python scripts/train.py bench --features dino
# Open the scene in the graphical user interface for annotation
python scripts/gui.py bench --features dino
Once you have annotated a scene, you can train some more on the annotations and render a video of the annotations:
# Train some more on the given annotations
python scripts/train.py bench --features dino
# Export labels for learning on some downstream task.
# The objects flag is optional, but tells it how many objects are in the scene per class.
# It is used to remove noise from the produced segmentation maps.
# Labels are saved at bench/output/semantic.
python scripts/export.py bench --objects 1
# Render a video of annotations and features
python scripts/render.py bench --model-dir bench/nerf/g15_hg+freq_dino_rgb1.0_d0.1_s1.0_f0.5_do0.1/ --out bench.mp4
prompt_kitchen.mp4
The repository contains an implementation of vision-language feature fields. See docs/vision-language.md
for instructions on how to run and use vision-language examples and the ROS node.
The GUI can be controlled with the following keybindings:
Key | Class Name |
---|---|
0 |
select background paint brush |
1 |
select foreground paint brush |
esc or Q |
shutdown application |
ctrl+S |
save model |
C |
clear image |
The scene directory structure is as follows:
raw_rgb/ # Raw distorted color frames.
rgb/ # Undistorted color frames either as png or jpg.
00000.jpg
00001.jpg
...
raw_depth/ # Raw original distorted depth frames.
00000.png # 16 bit grayscale png images where values are in millimeters.
00001.png # Depth frames might be smaller in size than the rgb frames.
...
depth/ # Undistorted frames to match a perfect pinhole camera model.
00000.png
00001.png
...
pose/
00000.txt # 4 x 4 world to camera transform.
00001.txt
...
semantic/ # Ground truth semantic annotations provided by user.
00010.png # These might not exist.
00150.png
gt_masks/ # Optional
00010.json # Dense ground truth masks used for evaluation.
00150.json # Used e.g. by scripts/evaluate.py
intrinsics.txt # 4 x 4 camera matrix.
bbox.txt # 6 values denoting the bounds of the scene (min_x, min_y, min_z, max_x, max_y, max_z).
nerf/ # Contains NeRF checkpoints and training metadata.
The script scripts/mapping.py
defines a mapping pipeline which will compute camera poses for your scene. The required input files are:
raw_rgb/
imagesraw_depth/
framesintrinsics.txt
camera intrinsic parameters
The computed outputs are:
rgb/
undistorted camera imagesdepth/
undistorted depth imagespose/
camera poses for each frameintrinsics.txt
inferred camera intrinsic parametersbbox.txt
scene bounds
Data can be imported from various sources, including:
See the data documentation for instructions on how to import from different sources.
For debugging, visualization and for comparing results, the project includes a script to convert scenes for running in instant-ngp
.
To do so, assuming you have instant-ngp
installed, you can:
- Convert the dataset generated through
autolabel
to a format readable byinstant-ngp
using the scriptscripts/convert_to_instant_ngp.py
. Example usage:python scripts/convert_to_instant_ngp.py --dataset_folder <scene>
- Run
instant-ngp
on the converted dataset:cd <path/to/instant_ngp/installation> ./build/testbed --scene <scene>/transforms.json
To fit the representation to the scene without the user interface, you can run scripts/train.py
. Checkpoints and metadata data will be stored in the scene folder under the nerf
directory.
To use pretrained features as additional training supervision, pretrain on these and then open the scene in the GUI, run:
python scripts/compute_feature_maps.py --features dino --autoencode <scene>
python scripts/train.py --features dino <scene>
python scripts/gui.py --features dino <scene>
The models are saved in the scene folder under the nerf
directory, organized according to the given parameters. I.e. the gui will load the model which matches the given parameters. If one is not found, it will simply randomly initialize the network.
We use labelme to annotate ground truth frames. Follow the installation instructions, using for instance a conda
environment, and making sure that your Python version is <3.10
to avoid type errors (see here). To annotate frames, run:
labelme rgb --nodata --autosave --output gt_masks
inside a scene directory, to annotate the frames in the rgb
folder. Corresponding annotations will be saved into the gt_masks
folder. You don't need to annotate every single frame, but can sample just a few.
To compute the intersection-over-union agreement against the manually annotated frames, run:
python scripts/evaluate.py <scene1> <scene2> # ...
This repository enforces code formatting rules using yapf
. After installing, you can format the code before committing by running:
yapf --recursive autolabel scripts -i
In case you want to automatically run formatting in Vim on save, you can follow these steps.
First, install google/yapf
as a vim plugin. If using Vundle, add Plugin 'google/yapf'
to your .vimrc
and run :PluginInstall
.
Copy the file .yapf.vim
to $HOME/.vim/autoload/yapf.vim
, creating the autoload directory if it doesn't exist.
To run yapf on save for Python files, add autocmd FileType python autocmd BufWritePre <buffer> call yapf#YAPF()
to your .vimrc
then restart vim.
Baking in the Feature: Accelerating Volumetric Segmentation by Rendering Feature Maps - Link
Neural Implicit Vision-Language Feature Fields - Link