$\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States

Abstract

We introduce ∞-Diff, a generative diffusion model which directly operates on infinite resolution data. By randomly sampling subsets of coordinates during training and learning to denoise the content at those coordinates, a continuous function is learned that allows sampling at arbitrary resolutions. In contrast to other recent infinite resolution generative models, our approach operates directly on the raw data, not requiring latent vector compression for context, using hypernetworks, nor relying on discrete components. As such, our approach achieves significantly higher sample quality, as evidenced by lower FID scores, as well as being able to effectively scale to much higher resolutions.

arXiv | BibTeX

Setup

Set up conda environment

The most easy way to set up the environment is using conda. To get set up quickly, use miniconda, and switch to the libmamba solver to speed up environment solving.

The following commands assume that CUDA 11.7 is installed. If a different version of CUDA is installed, alter requirements.yml accordingly. Run the following command to clone this repo using git and create the environment.

git clone https://github.com/samb-t/infty-diff && cd infty-diff
conda env create --name infty-diff --file requirements.yml
conda activate infty-diff

As part of the installation torchsparse and flash-attention are compiled from source so this may take a while.

By default torchsparse is installed for efficient sparse convolutions. This is what was used in all of our experiments as we found it performed the best; we include a depthwise convolution implementation of torchsparse which we found can outperform dense convolutions in some settings. However, there are other libraries available such as spconv and MinkowksiEngine, which on your hardware may perform better so may be preferred, however, we have not thoroughly tested these. When training models, the sparse backend can be selected with --config.model.backend="torchsparse".

Dataset setup

To configure the default paths for datasets used for training the models in this repo, simply edit the config file in in the config file - changing the data.root_dir attribute of each dataset you wish to use to the path where your dataset is saved locally.

Dataset	Official Link	Academic Torrents Link
FFHQ	Official FFHQ	Academic Torrents FFHQ
LSUN	Official LSUN	Academic Torrents LSUN
CelebA-HQ	Official CelebA-HQ	-

Commands

This section contains details on basic commands for training and generating samples. Image level models were trained on an A100 80GB and these commands presume the same level of hardware. If your GPU has less VRAM then you may need to train with smaller batch sizes and/or smaller models than defaults.

Training

The following command starts training the image level diffusion model on FFHQ.

python train_inf_ae_diffusion.py --config configs/ffhq_256_config.py --config.run.experiment="ffhq_mollified_256"

After which the latent model can be trained with

python train_latent_diffusion.py --config configs/ffhq_latent_config.py --config.run.experiment="ffhq_mollified_256_sampler" --decoder_config configs/ffhq_256_config.py --decoder_config.run.experiment="ffhq_mollified_256"

ml_collections is used for hyperparameters, so overriding these can be done by passing in values, for example, batch size can be changed with --config.train.batch_size=32.

Generate samples

After both models have been trained, the following script will generate a folder of samples

python experiments/generate_samples.py --config configs/ffhq_latent_config.py --config.run.experiment="ffhq_mollified_256_sampler" --decoder_config configs/ffhq_256_config.py --decoder_config.run.experiment="ffhq_mollified_256"

Acknowledgement

Huge thank you to everyone who makes their code available. In particular, some code is based on

Improved Denoising Diffusion Probabilistic Models
Diffusion Autoencoders: Toward a Meaningful and Decodable Representation
Fourier Neural Operator for Parametric Partial Differential Equations
Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

BibTeX

@article{bond2023infty,
  title       = {$\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States},
  author      = {Sam Bond-Taylor and Chris G. Willcocks},
  journal     = {arXiv preprint arXiv:2303.18242},
  year        = {2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

$\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States

Abstract

Table of Contents

Setup

Set up conda environment

Dataset setup

Commands

Training

Generate samples

Acknowledgement

BibTeX

Files

README.md

Latest commit

History

README.md

File metadata and controls

$\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States

Abstract

Table of Contents

Setup

Set up conda environment

Dataset setup

Commands

Training

Generate samples

Acknowledgement

BibTeX