generated from okotaku/template
-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #87 from okotaku/feat/support_noise_method
[Feature] Support Noise Methods
- Loading branch information
Showing
33 changed files
with
501 additions
and
93 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# Input Perturbation | ||
|
||
[Input Perturbation Reduces Exposure Bias in Diffusion Models](https://arxiv.org/abs/2301.11706) | ||
|
||
## Abstract | ||
|
||
Denoising Diffusion Probabilistic Models have shown an impressive generation quality, although their long sampling chain leads to high computational costs. In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the exposure bias problem in autoregressive text generation. Specifically, we note that there is a discrepancy between training and testing, since the former is conditioned on the ground truth samples, while the latter is conditioned on the previously generated results. To alleviate this problem, we propose a very simple but effective training regularization, consisting in perturbing the ground truth samples to simulate the inference time prediction errors. We empirically show that, without affecting the recall and precision, the proposed input perturbation leads to a significant improvement in the sample quality while reducing both the training and the inference times. For instance, on CelebA 64×64, we achieve a new state-of-the-art FID score of 1.27, while saving 37.5% of the training time. | ||
|
||
<div align=center> | ||
<img src="https://github.com/okotaku/diffengine/assets/24734142/60b9a296-6453-4d47-9c06-f40f43766273"/> | ||
</div> | ||
|
||
## Citation | ||
|
||
``` | ||
@article{ning2023input, | ||
title={Input Perturbation Reduces Exposure Bias in Diffusion Models}, | ||
author={Ning, Mang and Sangineto, Enver and Porrello, Angelo and Calderara, Simone and Cucchiara, Rita}, | ||
journal={arXiv preprint arXiv:2301.11706}, | ||
year={2023} | ||
} | ||
``` | ||
|
||
## Run Training | ||
|
||
Run Training | ||
|
||
``` | ||
# single gpu | ||
$ mim train diffengine ${CONFIG_FILE} | ||
# multi gpus | ||
$ mim train diffengine ${CONFIG_FILE} --gpus 2 --launcher pytorch | ||
# Example. | ||
$ mim train diffengine configs/input_perturbation/stable_diffusion_xl_pokemon_blip_input_perturbation.py | ||
``` | ||
|
||
## Inference with diffusers | ||
|
||
You can see details on [`docs/source/run_guides/run_xl.md`](../../docs/source/run_guides/run_xl.md#inference-with-diffusers). | ||
|
||
## Results Example | ||
|
||
#### stable_diffusion_xl_pokemon_blip_input_perturbation | ||
|
||
 |
12 changes: 12 additions & 0 deletions
12
configs/input_perturbation/stable_diffusion_xl_pokemon_blip_input_perturbation.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
_base_ = [ | ||
"../_base_/models/stable_diffusion_xl.py", | ||
"../_base_/datasets/pokemon_blip_xl.py", | ||
"../_base_/schedules/stable_diffusion_xl_50e.py", | ||
"../_base_/default_runtime.py", | ||
] | ||
|
||
model = dict(input_perturbation_gamma=0.1) | ||
|
||
train_dataloader = dict(batch_size=1) | ||
|
||
optim_wrapper_cfg = dict(accumulative_counts=4) # update every four times |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# Offset Noise | ||
|
||
[Diffusion with Offset Noise](https://www.crosslabs.org/blog/diffusion-with-offset-noise) | ||
|
||
## Abstract | ||
|
||
Fine-tuning against a modified noise, enables Stable Diffusion to generate very dark or light images easily. | ||
|
||
<div align=center> | ||
<img src="https://github.com/okotaku/diffengine/assets/24734142/76038bc8-b614-49da-9751-1a9efb83995f"/> | ||
</div> | ||
|
||
## Citation | ||
|
||
``` | ||
``` | ||
|
||
## Run Training | ||
|
||
Run Training | ||
|
||
``` | ||
# single gpu | ||
$ mim train diffengine ${CONFIG_FILE} | ||
# multi gpus | ||
$ mim train diffengine ${CONFIG_FILE} --gpus 2 --launcher pytorch | ||
# Example. | ||
$ mim train diffengine configs/offset_noise/stable_diffusion_xl_pokemon_blip_offset_noise.py | ||
``` | ||
|
||
## Inference with diffusers | ||
|
||
You can see details on [`docs/source/run_guides/run_xl.md`](../../docs/source/run_guides/run_xl.md#inference-with-diffusers). | ||
|
||
## Results Example | ||
|
||
#### stable_diffusion_xl_pokemon_blip_offset_noise | ||
|
||
 |
12 changes: 12 additions & 0 deletions
12
configs/offset_noise/stable_diffusion_xl_pokemon_blip_offset_noise.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
_base_ = [ | ||
"../_base_/models/stable_diffusion_xl.py", | ||
"../_base_/datasets/pokemon_blip_xl.py", | ||
"../_base_/schedules/stable_diffusion_xl_50e.py", | ||
"../_base_/default_runtime.py", | ||
] | ||
|
||
model = dict(noise_generator=dict(type="OffsetNoise", offset_weight=0.05)) | ||
|
||
train_dataloader = dict(batch_size=1) | ||
|
||
optim_wrapper_cfg = dict(accumulative_counts=4) # update every four times |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# Pyramid Noise | ||
|
||
[Multi-Resolution Noise for Diffusion Model Training](https://wandb.ai/johnowhitaker/multires_noise/reports/Multi-Resolution-Noise-for-Diffusion-Model-Training--VmlldzozNjYyOTU2) | ||
|
||
## Abstract | ||
|
||
This report proposes a new noising approach that adds multi-resolution noise to an image or latent image during diffusion model training. A model trained with this technique can generate stunning images with a very different aesthetic to the usual diffusion model outputs. This seems like a promising direction for future research. | ||
|
||
<div align=center> | ||
<img src="https://github.com/okotaku/diffengine/assets/24734142/943570cf-7283-4536-ae28-cd1cce1220b7"/> | ||
</div> | ||
|
||
## Citation | ||
|
||
``` | ||
``` | ||
|
||
## Run Training | ||
|
||
Run Training | ||
|
||
``` | ||
# single gpu | ||
$ mim train diffengine ${CONFIG_FILE} | ||
# multi gpus | ||
$ mim train diffengine ${CONFIG_FILE} --gpus 2 --launcher pytorch | ||
# Example. | ||
$ mim train diffengine configs/pyramid_noise/stable_diffusion_xl_pokemon_blip_pyramid_noise.py | ||
``` | ||
|
||
## Inference with diffusers | ||
|
||
You can see details on [`docs/source/run_guides/run_xl.md`](../../docs/source/run_guides/run_xl.md#inference-with-diffusers). | ||
|
||
## Results Example | ||
|
||
#### stable_diffusion_xl_pokemon_blip_pyramid_noise | ||
|
||
 |
12 changes: 12 additions & 0 deletions
12
configs/pyramid_noise/stable_diffusion_xl_pokemon_blip_pyramid_noise.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
_base_ = [ | ||
"../_base_/models/stable_diffusion_xl.py", | ||
"../_base_/datasets/pokemon_blip_xl.py", | ||
"../_base_/schedules/stable_diffusion_xl_50e.py", | ||
"../_base_/default_runtime.py", | ||
] | ||
|
||
model = dict(noise_generator=dict(type="PyramidNoise", discount=0.9)) | ||
|
||
train_dataloader = dict(batch_size=1) | ||
|
||
optim_wrapper_cfg = dict(accumulative_counts=4) # update every four times |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
from .editors import * # noqa: F403 | ||
from .losses import * # noqa: F403 | ||
from .utils import * # noqa: F403 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.