-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consideration of Flash Attention in Generative Components #7944
Comments
We use the following Python script for comparison. When flash attention is enabled, we are able to train the diffusion model with batch size 1. When flash attention is disabled, the training process will cause out of memory. For the experiments, we use a A100 GPU with 80GB of memory. If flash attention is enabled, 30GB+ of memory is utilized. |
Thanks @dongyang0122 for the verification. It seems to me that xformers/flash_attention brought benefits and we should weight including them in the dependency list. On the other hand, installation of the package could be challenging, because the range of If |
We could look at using the flash attention in Pytorch as well: https://pytorch.org/blog/pytorch2-2/ |
Thanks to @dongyang0122 for the script. I tried the script and tested it on both the original generative implementation and the PyTorch implementation. The results are shown below. We can see that PyTorch almost achieves the same results as the xformer implementation. Based on this, I believe we can use PyTorch instead. |
cc @guopengf for vis. Here is the code I use: https://github.com/KumoLiu/MONAI/tree/flash-atten |
Regarding the comments here: #7715 (comment)
We have removed the flash attention from the generative components merged into the core. However, based on experiments conducted by @dongyang0122, there appears to be a significant difference between using and not using flash attention. We should consider adding this option back. @dongyang0122 will share more detailed comparison results from the experiments.
The text was updated successfully, but these errors were encountered: