Consideration of Flash Attention in Generative Components #7944

KumoLiu · 2024-07-23T16:29:40Z

Regarding the comments here: #7715 (comment)

We have removed the flash attention from the generative components merged into the core. However, based on experiments conducted by @dongyang0122, there appears to be a significant difference between using and not using flash attention. We should consider adding this option back. @dongyang0122 will share more detailed comparison results from the experiments.

KumoLiu · 2024-07-23T16:30:11Z

cc @ericspod @virginiafdez

dongyang0122 · 2024-07-25T04:06:11Z

We use the following Python script for comparison. When flash attention is enabled, we are able to train the diffusion model with batch size 1. When flash attention is disabled, the training process will cause out of memory. For the experiments, we use a A100 GPU with 80GB of memory. If flash attention is enabled, 30GB+ of memory is utilized.

verify_training.py.txt

mingxin-zheng · 2024-07-26T02:56:00Z

Thanks @dongyang0122 for the verification.

It seems to me that xformers/flash_attention brought benefits and we should weight including them in the dependency list.

On the other hand, installation of the package could be challenging, because the range of torch, cuda, and os platform seems narrow: https://github.com/facebookresearch/xformers/issues

If monai leaves the installation of xformers to the user but we keep the optional_import in the code base, is this acceptable?

ericspod · 2024-07-26T14:13:18Z

We could look at using the flash attention in Pytorch as well: https://pytorch.org/blog/pytorch2-2/

KumoLiu · 2024-07-29T11:08:26Z

Thanks to @dongyang0122 for the script. I tried the script and tested it on both the original generative implementation and the PyTorch implementation. The results are shown below. We can see that PyTorch almost achieves the same results as the xformer implementation. Based on this, I believe we can use PyTorch instead.

KumoLiu · 2024-07-29T15:13:12Z

cc @guopengf for vis. Here is the code I use: https://github.com/KumoLiu/MONAI/tree/flash-atten

ericspod assigned virginiafdez Jul 26, 2024

virginiafdez mentioned this issue Aug 1, 2024

Flash attention #7977

Merged

7 tasks

KumoLiu closed this as completed in #7977 Aug 6, 2024

KumoLiu closed this as completed in 6c23fd0 Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consideration of Flash Attention in Generative Components #7944

Consideration of Flash Attention in Generative Components #7944

KumoLiu commented Jul 23, 2024

KumoLiu commented Jul 23, 2024

dongyang0122 commented Jul 25, 2024 •

edited

Loading

mingxin-zheng commented Jul 26, 2024

ericspod commented Jul 26, 2024

KumoLiu commented Jul 29, 2024

KumoLiu commented Jul 29, 2024

Consideration of Flash Attention in Generative Components #7944

Consideration of Flash Attention in Generative Components #7944

Comments

KumoLiu commented Jul 23, 2024

KumoLiu commented Jul 23, 2024

dongyang0122 commented Jul 25, 2024 • edited Loading

mingxin-zheng commented Jul 26, 2024

ericspod commented Jul 26, 2024

KumoLiu commented Jul 29, 2024

KumoLiu commented Jul 29, 2024

dongyang0122 commented Jul 25, 2024 •

edited

Loading