Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers

Efstathios Karypidis^1,3, Ioannis Kakogeorgiou¹, Spyros Gidaris², Nikos Komodakis^1,4,5

¹Archimedes/Athena RC ²valeo.ai
³National Technical University of Athens ⁴University of Crete ⁵IACM-Forth

This repository contains the official implementation of the paper: Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers

News-ToDos

2025-1-14: Arxiv Preprint and GitHub repository are released!

Add new branches with code for training with vq-vae & separate tokens for each modality

Installation

The code is tested with Python 3.11 and PyTorch 2.0.1+cu121 on Ubuntu 22.04.05 LTS. Create a new conda environment:

conda create -n futurist python=3.11
conda activate futurist

Clone the repository and install the required packages:

pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu121   
git clone https://github.com/Sta8is/FUTURIST
cd FUTURIST
pip install -r requirements.txt

Dataset Preparation

We use Cityscapes dataset for our experiments. Especially, we use the leftImg8bit_sequence_trainvaltest sequences. In order to extract segmentation maps we utilize Segmenter. In order to extract depth maps we utilize DepthAnythingV2. You can skip downloading leftImg8bit_sequence_trainvaltest and preprocessing and simply download the precomputed segmentation maps from here and depth maps from here. Also, in order to evaluate futurist gtFine needs to be processed using cityscapesScripts. Alternatively, you can download the processed dataset from here. The final structure of the dataset should be as follow.

cityscapes
│
├───leftImg8bit_sequence_depthv2
│   ├───train
│   ├───val
├───leftImg8bit_sequence_segmaps_ids
│   ├───train
│   ├───val
├───gtFine
│   ├───train
│   ├───val
│   ├───test

Futurist-training

To train Futurist with default parameters use the following command:

python train_futurist.py --num_gpus=8 --precision 16-mixed --eval_freq 10 --batch_size 2  --max_epochs 3200 --lr_base 4e-5 --patch_size 16 \
    --eval_mode_during_training --evaluate --single_step_sample_train  --masking "simple_replace" --seperable_attention --random_horizontal_flip \
    --random_crop --use_fc_bias --data_path="/path/to/cityscapes/leftImg8bit_sequence_segmaps_ids" --modality segmaps_depth \
    --sequence_length 5 --num_classes 19 --emb_dim 10,10 --accum_iter 4 --w_s 0.85 \
    --dst_path "/logdir/futurist" --masking_strategy "par_shared_excl" --modal_fusion "concat"

Evaluation

You can also download the pre-trained model from here. To evaluate Futurist trained model use the following command:

python train_futurist.py --num_gpus=4 --precision 16-mixed --eval_freq 10 --batch_size 2  --max_epochs 3200 --lr_base 4e-5 --patch_size 16 \
    --eval_mode_during_training --evaluate --single_step_sample_train  --masking "simple_replace" --seperable_attention --random_horizontal_flip \
    --random_crop --use_fc_bias --data_path="/path/to/cityscapes/leftImg8bit_sequence_segmaps_ids" --modality segmaps_depth \
    --sequence_length 5 --num_classes 19 --emb_dim 10,10 --accum_iter 4 --w_s 0.85 \
    --dst_path "/logdir/futurist" --masking_strategy "par_shared_excl" --modal_fusion "concat" \
    --eval_ckpt_only --ckpt "/path/to/futurist.ckpt"

Demo

We provide 2 quick demos.

Demo.

Citation

If you found Futurist useful in your research, please consider starring ⭐ us on GitHub and citing 📚 us in your research!

@article{karypidis2025advancing,
  title={Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers},
  author={Karypidis, Efstathios and Kakogeorgiou, Ioannis and Gidaris, Spyros and Komodakis, Nikos},
  journal={arXiv preprint arXiv:2501.08303},
  year={2025}
}

Acknowledgements

Our code is partially based on Maskgit-pytorch, DepthAnythingV2, Segmenter for their work and open-source code.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
configs		configs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
requirements.txt		requirements.txt
train_futurist.py		train_futurist.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers

Contents

News-ToDos

Installation

Dataset Preparation

Futurist-training

Evaluation

Demo

Citation

Acknowledgements

About

Releases

Packages

Languages

License

Sta8is/FUTURIST

Folders and files

Latest commit

History

Repository files navigation

Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers

Contents

News-ToDos

Installation

Dataset Preparation

Futurist-training

Evaluation

Demo

Citation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages