Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][Feature] Support KV Partition for BatchPrefill kernel for Paged & Ragged KV-Cache. #75

Closed
wants to merge 13 commits into from

Conversation

yzh119
Copy link
Collaborator

@yzh119 yzh119 commented Jan 19, 2024

Before this PR, FlashInfer supports KV sequence parallelism for single decode/prefill and batch decode, but not batch prefill, however, this feature is also important for batch prefill kernel. This PR implements KV partition for batch prefill kernels (on both Paged & Ragged KV-Cache).

@yzh119 yzh119 force-pushed the batch-prefill-partition-kv branch from 551858a to bf6e4dc Compare January 21, 2024 14:14
@yzh119 yzh119 mentioned this pull request Feb 27, 2024
3 tasks
@AgrawalAmey
Copy link

AgrawalAmey commented Mar 14, 2024

@yzh119 is this PR good to use? This would be extremely useful for some of my work.

@yzh119
Copy link
Collaborator Author

yzh119 commented Mar 16, 2024

@AgrawalAmey We did a huge amount of code refactor since the last commit of this PR, so I need to rebase and add some new commits, please stay tuned :)

@AgrawalAmey
Copy link

@yzh119 looking forward to it! I would be happy to help accelerate this, please let me know if I can help in any way.

@ZSL98
Copy link

ZSL98 commented Apr 3, 2024

Looking forward to it!!

@chenzhuofu
Copy link

chenzhuofu commented Jun 6, 2024

@yzh119 Typing to ask if this is ready for use? I just find BatchPrefillWithRaggedKVCacheDispatched in main branch code but not sure if it could work.

@yzh119
Copy link
Collaborator Author

yzh119 commented Jun 17, 2024

Moved to #310

@yzh119 yzh119 closed this Jun 17, 2024
@yzh119
Copy link
Collaborator Author

yzh119 commented Jun 19, 2024

@chenzhuofu @ZSL98 @AgrawalAmey
This was done in #310.

@AgrawalAmey
Copy link

Amazing, thanks a lot for the awesome work! 🙏

yzh119 added a commit that referenced this pull request Jun 20, 2024
Duplicate of #75, but re-based on the main branch.

Note that to support CUDAGraph, we cannot make `kv_chunk_size` a
function argument, which will be passed by value, and cannot change once
captured by CUDAGraph. Instead, we pass `kv_chunk_size` through a
`kv_chunk_size_ptr` which is a pointer to a global memory address that
stores the `kv_chunk_size`, its value can be set in `BeginForward`
fuctions.
@yzh119 yzh119 deleted the batch-prefill-partition-kv branch August 27, 2024 04:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants