Enable Flash Attention for SD3 MMDiT #2014

james77777778 · 2024-12-09T09:40:11Z

This PR utilizes ops.dot_product_attention to accelerate inference in SD3

800x800
SD3 medium
float16

Backend	Flash Attention	Cost Time	Improvement
jax	❌	10.61s
jax	✅	5.45s	-48.7%
torch	❌	24.10s
torch	✅	18.43s	-23.6%

I noticed that ops.dot_product_attention performed slower than the vanilla impl in the tensorflow backend. Therefore, this optimization path is skipped for it.
(vanilla: 10.55s vs. ops.dot_product_attention: 14.33s)

EDITED:
jax now runs faster than diffusers in an out-of-box manner:
diffusers.StableDiffusion3Pipeline: 6.15s

The benchmark script (KerasHub):

import time

from keras_hub.src.models.stable_diffusion_3.stable_diffusion_3_text_to_image import (
    StableDiffusion3TextToImage,
)

height, width = 800, 800
preset = "stable_diffusion_3_medium"
num_steps = 28
guidance_scale = 7.0
dtype = "float16"
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
prompt = [prompt]
text_to_image = StableDiffusion3TextToImage.from_preset(
    preset,
    image_shape=(height, width, 3),
    dtype=dtype,
)

for _ in range(1):
    _ = text_to_image.generate(
        prompt, num_steps=num_steps, guidance_scale=guidance_scale
    )
print("Finish warmup.")

st = time.time()
images = text_to_image.generate(
    prompt, num_steps=num_steps, guidance_scale=guidance_scale
)
ed = time.time()
print(f"Cost time: {ed-st:.2f}s")

The benchmark script (diffusers):

import time

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    text_encoder_3=None,
    tokenizer_3=None,
    torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")
height, width = 800, 800
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

image = pipe(
    prompt,
    negative_prompt="",
    num_inference_steps=28,
    guidance_scale=7.0,
    height=height,
    width=width,
).images[0]
print("Finish warmup.")

st = time.time()
image = pipe(
    prompt,
    negative_prompt="",
    num_inference_steps=28,
    guidance_scale=7.0,
    height=height,
    width=width,
).images[0]
print(time.time() - st)

mattdangerw

Should we test this somehow?

mattdangerw · 2024-12-09T19:29:40Z

keras_hub/src/models/stable_diffusion_3/mmdit.py

+        if (
+            hasattr(ops, "dot_product_attention")
+            and hasattr(keras.config, "is_flash_attention_enabled")
+            and keras.backend.backend() != "tensorflow"


Maybe let's drop the tf part? And just do the same on all backends?

I don't think we want to be in the business of trying to outsmart core Keras. And layers.MultiHeadAttention isn't doing anything like this.

Sure!

I have submitted a PR to core Keras:
keras-team/keras#20615
With that change, the cost time of the tensorflow will be comparable to jax (w/o flash attention)

tensorflow: 10.57s

jax: 10.61s

Should we test this somehow?

I’m not sure how to test this since ops.dot_product_attention is intended to be a drop-in replacement.
Should I compare the numeric w/ and w/o ops.dot_product_attention?

Thanks! keras-team/keras#20615, yeah I think that's the way to go.

Hmm, yeah as long as the code is being exercised, probably fine to leave as is. Let's go with this!

Enable flash attention for SD3 MMDiT

ca79330

mattdangerw reviewed Dec 9, 2024

View reviewed changes

james77777778 mentioned this pull request Dec 10, 2024

Allow lower precision in einsum in dot_product_attention keras-team/keras#20615

Merged

Remove tf condition

31e40c0

mattdangerw enabled auto-merge (squash) December 10, 2024 19:16

mattdangerw approved these changes Dec 12, 2024

View reviewed changes

mattdangerw merged commit 821c014 into keras-team:master Dec 12, 2024
7 checks passed

james77777778 deleted the flash-attn-sd3 branch December 25, 2024 03:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Flash Attention for SD3 MMDiT #2014

Enable Flash Attention for SD3 MMDiT #2014

james77777778 commented Dec 9, 2024 •

edited

Loading

mattdangerw left a comment

mattdangerw Dec 9, 2024

james77777778 Dec 10, 2024

james77777778 Dec 10, 2024

mattdangerw Dec 10, 2024

Enable Flash Attention for SD3 MMDiT #2014

Enable Flash Attention for SD3 MMDiT #2014

Conversation

james77777778 commented Dec 9, 2024 • edited Loading

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw Dec 9, 2024

Choose a reason for hiding this comment

james77777778 Dec 10, 2024

Choose a reason for hiding this comment

james77777778 Dec 10, 2024

Choose a reason for hiding this comment

mattdangerw Dec 10, 2024

Choose a reason for hiding this comment

james77777778 commented Dec 9, 2024 •

edited

Loading