Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ima for split-kv kernel #20

Merged
merged 1 commit into from
Sep 5, 2024
Merged

Conversation

bfontain
Copy link

@bfontain bfontain commented Sep 5, 2024

This is a cherry pick of Dao-AILab#1085 which fixes an illegal memory access in the splitkv flash attention kernel.

I've done some simple tests and it appears to not break anything.

@WoosukKwon WoosukKwon merged commit 0c2fb25 into vllm-project:main Sep 5, 2024
@WoosukKwon
Copy link

@bfontain Thanks for the PR!

@sfc-gh-zhwang
Copy link

The original PR also mentioned Dao-AILab#970, is that included in vllm's flash-attention as well?
Context is that we constantly seeing ima, after this PR, the frequency reduces, but it still happens. If we commented out split-kv part, ima disappears, so we suspect it's still somewhere in split-kv part.

@bfontain
Copy link
Author

bfontain commented Oct 2, 2024

Its included in d562aa6 see commit f816dee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants