generated from okotaku/template
-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #83 from okotaku/feat/ssd_1b
[Feature] Support SSD-1B
- Loading branch information
Showing
14 changed files
with
825 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,4 +13,4 @@ jobs: | |
steps: | ||
- uses: readthedocs/actions/preview@v1 | ||
with: | ||
project-slug: "diffengine" | ||
project-slug: "DiffEngine" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
model = dict( | ||
type="SSD1B", | ||
model="stabilityai/stable-diffusion-xl-base-1.0", | ||
student_model="segmind/SSD-1B", | ||
student_model_weight="unet", | ||
vae_model="madebyollin/sdxl-vae-fp16-fix", | ||
gradient_checkpointing=True) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
model = dict( | ||
type="SSD1B", | ||
model="stabilityai/stable-diffusion-xl-base-1.0", | ||
student_model="segmind/SSD-1B", | ||
student_model_weight="orig_unet", | ||
vae_model="madebyollin/sdxl-vae-fp16-fix", | ||
gradient_checkpointing=True) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
model = dict( | ||
type="StableDiffusionXL", | ||
model="segmind/SSD-1B", | ||
vae_model="madebyollin/sdxl-vae-fp16-fix", | ||
lora_config=dict(rank=8)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# SSD-1B | ||
|
||
[SSD-1B](https://blog.segmind.com/introducing-segmind-ssd-1b/) | ||
|
||
## Abstract | ||
|
||
Today, Segmind is thrilled to announce the open sourcing of our new foundational model, SSD-1B, the fastest diffusion-based text-to-image model in the market, with unprecedented image generation times for a 1024x1024 image. Developed as part of our distillation series, SSD-1B is 50% smaller and 60% faster compared to the SDXL 1.0 model. This reduction in speed and size comes with a minimal impact on image quality when compared to SDXL 1.0. Furthermore, we are excited to reveal that the SSD-1B model has been licensed for commercial use, opening avenues for businesses and developers to integrate this groundbreaking technology into their services and products. | ||
|
||
<div align=center> | ||
<img src="https://github.com/okotaku/diffengine/assets/24734142/5c5a0e65-d06d-43a0-873d-f804e1900428"/> | ||
</div> | ||
|
||
## Citation | ||
|
||
``` | ||
``` | ||
|
||
## Dependencies | ||
|
||
Note that install diffusers from source to use SSD-1B. | ||
|
||
``` | ||
pip install -U git+https://github.com/huggingface/diffusers.git | ||
``` | ||
|
||
## Run Training | ||
|
||
Run Training | ||
|
||
``` | ||
# single gpu | ||
$ mim train diffengine ${CONFIG_FILE} | ||
# multi gpus | ||
$ mim train diffengine ${CONFIG_FILE} --gpus 2 --launcher pytorch | ||
# Example. | ||
$ mim train diffengine configs/ssd_1b/ssd_1b_distill_pokemon_blip.py | ||
``` | ||
|
||
## Inference with diffusers | ||
|
||
You can see more details on [`docs/source/run_guides/run_xl.md`](../../docs/source/run_guides/run_xl.md#inference-with-diffusers). | ||
|
||
## Results Example | ||
|
||
#### ssd_1b_distill_from_sdxl_pokemon_blip | ||
|
||
data:image/s3,"s3://crabby-images/da222/da22289bd93c0520c23d9c50b700f59af3a215d5" alt="example" | ||
|
||
#### ssd_1b_distill_pokemon_blip | ||
|
||
data:image/s3,"s3://crabby-images/09bd7/09bd77f3ddb52931eac651eb713d8c6aab2f94c2" alt="example2" | ||
|
||
## Blog post | ||
|
||
[SSD-1B: A Leap in Efficient T2I Generation](https://medium.com/@to78314910/ssd-1b-a-leap-in-efficient-t2i-generation-138bb05fdd75) | ||
|
||
## Acknowledgement | ||
|
||
These implementations are based on [segmind/SSD-1B](https://github.com/segmind/SSD-1B). Thank you for the great open source project. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
_base_ = [ | ||
"../_base_/models/distill_ssd_1b_from_sdxl.py", | ||
"../_base_/datasets/pokemon_blip_xl.py", | ||
"../_base_/schedules/stable_diffusion_xl_50e.py", | ||
"../_base_/default_runtime.py", | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
_base_ = [ | ||
"../_base_/models/distill_ssd_1b.py", | ||
"../_base_/datasets/pokemon_blip_xl.py", | ||
"../_base_/schedules/stable_diffusion_xl_50e.py", | ||
"../_base_/default_runtime.py", | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
# SSD-1B DreamBooth | ||
|
||
[DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation](https://arxiv.org/abs/2208.12242) | ||
[SSD-1B](https://blog.segmind.com/introducing-segmind-ssd-1b/) | ||
|
||
## Abstract | ||
|
||
Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering, all while preserving the subject's key features. We also provide a new dataset and evaluation protocol for this new task of subject-driven generation. | ||
|
||
<div align=center> | ||
<img src="https://github.com/okotaku/dethub/assets/24734142/33b1953d-ce42-4f9a-bcbc-87050cfe4f6f"/> | ||
</div> | ||
|
||
## Citation | ||
|
||
``` | ||
@inproceedings{ruiz2023dreambooth, | ||
title={Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation}, | ||
author={Ruiz, Nataniel and Li, Yuanzhen and Jampani, Varun and Pritch, Yael and Rubinstein, Michael and Aberman, Kfir}, | ||
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, | ||
year={2023} | ||
} | ||
``` | ||
|
||
## Dependencies | ||
|
||
Note that install diffusers from source to use SSD-1B. | ||
|
||
``` | ||
pip install -U git+https://github.com/huggingface/diffusers.git | ||
``` | ||
|
||
## Run Training | ||
|
||
Run Training | ||
|
||
``` | ||
# single gpu | ||
$ mim train diffengine ${CONFIG_FILE} | ||
# multi gpus | ||
$ mim train diffengine ${CONFIG_FILE} --gpus 2 --launcher pytorch | ||
# Example. | ||
$ mim train diffengine configs/ssd_1b_dreambooth/ssd_1b_dreambooth_lora_dog.py | ||
``` | ||
|
||
## Training Speed | ||
|
||
Environment: | ||
|
||
- A6000 Single GPU | ||
- nvcr.io/nvidia/pytorch:23.07-py3 | ||
|
||
Settings: | ||
|
||
- 500 iterations training, (validation 4 images / 100 iterations) | ||
- LoRA (rank=8) / DreamBooth | ||
|
||
| Model | total time | | ||
| :----: | :--------: | | ||
| SDXL | 18 m 55 s | | ||
| SSD-1B | 12 m 30 s | | ||
|
||
## Inference with diffusers | ||
|
||
Once you have trained a model, specify the path to where the model is saved, and use it for inference with the `diffusers`. | ||
|
||
```py | ||
import torch | ||
from diffusers import DiffusionPipeline, AutoencoderKL | ||
|
||
checkpoint = 'work_dirs/ssd_1b_dreambooth_lora_dog/step499' | ||
prompt = 'A photo of sks dog in a bucket' | ||
|
||
vae = AutoencoderKL.from_pretrained( | ||
'madebyollin/sdxl-vae-fp16-fix', | ||
torch_dtype=torch.float16, | ||
) | ||
pipe = DiffusionPipeline.from_pretrained( | ||
'segmind/SSD-1B', vae=vae, torch_dtype=torch.float16) | ||
pipe.to('cuda') | ||
pipe.load_lora_weights(checkpoint) | ||
|
||
image = pipe( | ||
prompt, | ||
num_inference_steps=50, | ||
width=1024, | ||
height=1024, | ||
).images[0] | ||
image.save('demo.png') | ||
``` | ||
|
||
You can see more details on [Run DreamBooth XL docs](../../docs/source/run_guides/run_dreambooth_xl.md#inference-with-diffusers). | ||
|
||
## Results Example | ||
|
||
#### ssd_1b_dreambooth_lora_dog | ||
|
||
data:image/s3,"s3://crabby-images/583ed/583ed787ddc9c96ef6c6c86e0022f969288319e0" alt="exampledog" | ||
|
||
## Blog post | ||
|
||
[SSD-1B: A Leap in Efficient T2I Generation](https://medium.com/@to78314910/ssd-1b-a-leap-in-efficient-t2i-generation-138bb05fdd75) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
_base_ = [ | ||
"../_base_/models/ssd_1b_lora.py", | ||
"../_base_/datasets/dog_dreambooth_xl.py", | ||
"../_base_/schedules/stable_diffusion_500.py", | ||
"../_base_/default_runtime.py", | ||
] | ||
|
||
train_dataloader = dict( | ||
dataset=dict(class_image_config=dict(model={{_base_.model.model}}))) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
from .ssd_1b import SSD1B | ||
|
||
__all__ = ["SSD1B"] |
Oops, something went wrong.