[WIP][Feature] Support KV Partition for BatchPrefill kernel for Paged & Ragged KV-Cache. #75

yzh119 · 2024-01-19T09:03:30Z

Before this PR, FlashInfer supports KV sequence parallelism for single decode/prefill and batch decode, but not batch prefill, however, this feature is also important for batch prefill kernel. This PR implements KV partition for batch prefill kernels (on both Paged & Ragged KV-Cache).

AgrawalAmey · 2024-03-14T05:52:30Z

@yzh119 is this PR good to use? This would be extremely useful for some of my work.

yzh119 · 2024-03-16T07:06:14Z

@AgrawalAmey We did a huge amount of code refactor since the last commit of this PR, so I need to rebase and add some new commits, please stay tuned :)

AgrawalAmey · 2024-03-16T14:38:58Z

@yzh119 looking forward to it! I would be happy to help accelerate this, please let me know if I can help in any way.

ZSL98 · 2024-04-03T05:58:27Z

Looking forward to it!!

chenzhuofu · 2024-06-06T03:12:22Z

@yzh119 Typing to ask if this is ready for use? I just find BatchPrefillWithRaggedKVCacheDispatched in main branch code but not sure if it could work.

yzh119 · 2024-06-17T08:54:23Z

Moved to #310

yzh119 · 2024-06-19T22:50:27Z

@chenzhuofu @ZSL98 @AgrawalAmey
This was done in #310.

AgrawalAmey · 2024-06-19T22:51:51Z

Amazing, thanks a lot for the awesome work! 🙏

Duplicate of #75, but re-based on the main branch. Note that to support CUDAGraph, we cannot make `kv_chunk_size` a function argument, which will be passed by value, and cannot change once captured by CUDAGraph. Instead, we pass `kv_chunk_size` through a `kv_chunk_size_ptr` which is a pointer to a global memory address that stores the `kv_chunk_size`, its value can be set in `BeginForward` fuctions.

yzh119 mentioned this pull request Jan 21, 2024

Support RoPE position info in batch prefill/decode kernels #69

Merged

yzh119 added 7 commits January 21, 2024 13:25

wip

25f23fe

bugfix

39630bf

fix

e5a338b

bugfix

45713f9

bugfix

66dd6da

wip

12d1634

wip

bf6e4dc

yzh119 force-pushed the batch-prefill-partition-kv branch from 551858a to bf6e4dc Compare January 21, 2024 14:14

yzh119 added 6 commits January 21, 2024 15:29

wip

73b7297

wip

21daa24

wip

b828c7f

wip

6ef1c3b

wip

9ddf79e

how long can I keep doing this, Clara?

1a7a218

yzh119 mentioned this pull request Feb 27, 2024

[Roadmap] 0.0.3 Release Checklist #138

Closed

3 tasks

yzh119 mentioned this pull request Mar 23, 2024

Make flashinfer kernels cuda graphs friendly #187

Closed

yzh119 mentioned this pull request Jun 17, 2024

perf: split kv-cache for prefill/append kernels #310

Merged

yzh119 closed this Jun 17, 2024

AgrawalAmey mentioned this pull request Jun 20, 2024

Port of PR 1863: Remove flashinfer-unpaged and flashattention backends microsoft/sarathi-serve#17

Merged

yzh119 deleted the batch-prefill-partition-kv branch August 27, 2024 04:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][Feature] Support KV Partition for BatchPrefill kernel for Paged & Ragged KV-Cache. #75

[WIP][Feature] Support KV Partition for BatchPrefill kernel for Paged & Ragged KV-Cache. #75

yzh119 commented Jan 19, 2024

AgrawalAmey commented Mar 14, 2024 •

edited

Loading

yzh119 commented Mar 16, 2024 •

edited

Loading

AgrawalAmey commented Mar 16, 2024

ZSL98 commented Apr 3, 2024

chenzhuofu commented Jun 6, 2024 •

edited

Loading

yzh119 commented Jun 17, 2024 •

edited

Loading

yzh119 commented Jun 19, 2024

AgrawalAmey commented Jun 19, 2024

[WIP][Feature] Support KV Partition for BatchPrefill kernel for Paged & Ragged KV-Cache. #75

[WIP][Feature] Support KV Partition for BatchPrefill kernel for Paged & Ragged KV-Cache. #75

Conversation

yzh119 commented Jan 19, 2024

AgrawalAmey commented Mar 14, 2024 • edited Loading

yzh119 commented Mar 16, 2024 • edited Loading

AgrawalAmey commented Mar 16, 2024

ZSL98 commented Apr 3, 2024

chenzhuofu commented Jun 6, 2024 • edited Loading

yzh119 commented Jun 17, 2024 • edited Loading

yzh119 commented Jun 19, 2024

AgrawalAmey commented Jun 19, 2024

AgrawalAmey commented Mar 14, 2024 •

edited

Loading

yzh119 commented Mar 16, 2024 •

edited

Loading

chenzhuofu commented Jun 6, 2024 •

edited

Loading

yzh119 commented Jun 17, 2024 •

edited

Loading