[Kernel] Update vllm-flash-attn version #10742

WoosukKwon · 2024-11-28T09:41:56Z

Upgrades to vllm-project/flash-attention#30, which will help reduce CPU overheads in launching the kernels.

github-actions · 2024-11-28T09:42:09Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

mergify · 2024-11-28T10:26:43Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @WoosukKwon.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Woosuk Kwon <[email protected]>

…-project#10742) Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Andrew Feldman <[email protected]>

…-project#10742) Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 28, 2024

WoosukKwon requested a review from tlrmchlsmth as a code owner November 28, 2024 09:41

mergify bot added the ci/build label Nov 28, 2024

WoosukKwon mentioned this pull request Nov 28, 2024

Clean up API & Bypass torch.autograd.Function vllm-project/flash-attention#30

Merged

mergify bot added the needs-rebase label Nov 28, 2024

fix

892cdce

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon force-pushed the test-fa branch from 99c45ad to 892cdce Compare November 28, 2024 10:28

WoosukKwon changed the title ~~test~~ [Kernel] Update vllm-flash-attn version Nov 28, 2024

mergify bot removed the needs-rebase label Nov 28, 2024

Update

677ceb2

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon merged commit 8c1e77f into main Nov 28, 2024
9 of 14 checks passed

WoosukKwon deleted the test-fa branch November 28, 2024 16:31

afeldman-nm pushed a commit to neuralmagic/vllm that referenced this pull request Dec 2, 2024

[Kernel] Update vllm-flash-attn version to reduce CPU overheads (vllm…

1362dac

…-project#10742) Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Andrew Feldman <[email protected]>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Kernel] Update vllm-flash-attn version to reduce CPU overheads (vllm…

5496147

…-project#10742) Signed-off-by: Woosuk Kwon <[email protected]>

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[Kernel] Update vllm-flash-attn version to reduce CPU overheads (vllm…

f5aa6cf

…-project#10742) Signed-off-by: Woosuk Kwon <[email protected]>

anko-intel pushed a commit to HabanaAI/vllm-fork that referenced this pull request Feb 12, 2025

[Kernel] Update vllm-flash-attn version to reduce CPU overheads (vllm…

c71b17d

…-project#10742) Signed-off-by: Woosuk Kwon <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Update vllm-flash-attn version #10742

[Kernel] Update vllm-flash-attn version #10742

WoosukKwon commented Nov 28, 2024 •

edited

Loading

github-actions bot commented Nov 28, 2024

mergify bot commented Nov 28, 2024

[Kernel] Update vllm-flash-attn version #10742

[Kernel] Update vllm-flash-attn version #10742

Conversation

WoosukKwon commented Nov 28, 2024 • edited Loading

github-actions bot commented Nov 28, 2024

mergify bot commented Nov 28, 2024

WoosukKwon commented Nov 28, 2024 •

edited

Loading