Skip to content

Official Implementation of Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers

License

Notifications You must be signed in to change notification settings

Sta8is/FUTURIST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers

Efstathios Karypidis1,3, Ioannis Kakogeorgiou1, Spyros Gidaris2, Nikos Komodakis1,4,5

1Archimedes/Athena RC 2valeo.ai
3National Technical University of Athens 4University of Crete 5IACM-Forth

DINO--Foresight Open In Colab License


This repository contains the official implementation of the paper: Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers

Contents

  1. News-ToDos
  2. Installation
  3. Dataset Preparation
  4. Futurist Training
  5. Evaluation
  6. Demo
  7. Citation
  8. Acknowledgements

News-ToDos

2025-1-14: Arxiv Preprint and GitHub repository are released!

  • Add new branches with code for training with vq-vae & separate tokens for each modality

Installation

The code is tested with Python 3.11 and PyTorch 2.0.1+cu121 on Ubuntu 22.04.05 LTS. Create a new conda environment:

conda create -n futurist python=3.11
conda activate futurist

Clone the repository and install the required packages:

pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu121   
git clone https://github.com/Sta8is/FUTURIST
cd FUTURIST
pip install -r requirements.txt

Dataset Preparation

We use Cityscapes dataset for our experiments. Especially, we use the leftImg8bit_sequence_trainvaltest sequences. In order to extract segmentation maps we utilize Segmenter. In order to extract depth maps we utilize DepthAnythingV2. You can skip downloading leftImg8bit_sequence_trainvaltest and preprocessing and simply download the precomputed segmentation maps from here and depth maps from here. Also, in order to evaluate futurist gtFine needs to be processed using cityscapesScripts. Alternatively, you can download the processed dataset from here. The final structure of the dataset should be as follow.

cityscapes
│
├───leftImg8bit_sequence_depthv2
│   ├───train
│   ├───val
├───leftImg8bit_sequence_segmaps_ids
│   ├───train
│   ├───val
├───gtFine
│   ├───train
│   ├───val
│   ├───test

Futurist-training

To train Futurist with default parameters use the following command:

python train_futurist.py --num_gpus=8 --precision 16-mixed --eval_freq 10 --batch_size 2  --max_epochs 3200 --lr_base 4e-5 --patch_size 16 \
    --eval_mode_during_training --evaluate --single_step_sample_train  --masking "simple_replace" --seperable_attention --random_horizontal_flip \
    --random_crop --use_fc_bias --data_path="/path/to/cityscapes/leftImg8bit_sequence_segmaps_ids" --modality segmaps_depth \
    --sequence_length 5 --num_classes 19 --emb_dim 10,10 --accum_iter 4 --w_s 0.85 \
    --dst_path "/logdir/futurist" --masking_strategy "par_shared_excl" --modal_fusion "concat" 

Evaluation

You can also download the pre-trained model from here. To evaluate Futurist trained model use the following command:

python train_futurist.py --num_gpus=4 --precision 16-mixed --eval_freq 10 --batch_size 2  --max_epochs 3200 --lr_base 4e-5 --patch_size 16 \
    --eval_mode_during_training --evaluate --single_step_sample_train  --masking "simple_replace" --seperable_attention --random_horizontal_flip \
    --random_crop --use_fc_bias --data_path="/path/to/cityscapes/leftImg8bit_sequence_segmaps_ids" --modality segmaps_depth \
    --sequence_length 5 --num_classes 19 --emb_dim 10,10 --accum_iter 4 --w_s 0.85 \
    --dst_path "/logdir/futurist" --masking_strategy "par_shared_excl" --modal_fusion "concat" \
    --eval_ckpt_only --ckpt "/path/to/futurist.ckpt"

Demo

We provide 2 quick demos.

Citation

If you found Futurist useful in your research, please consider starring ⭐ us on GitHub and citing 📚 us in your research!

@article{karypidis2025advancing,
  title={Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers},
  author={Karypidis, Efstathios and Kakogeorgiou, Ioannis and Gidaris, Spyros and Komodakis, Nikos},
  journal={arXiv preprint arXiv:2501.08303},
  year={2025}
}

Acknowledgements

Our code is partially based on Maskgit-pytorch, DepthAnythingV2, Segmenter for their work and open-source code.

About

Official Implementation of Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published