Feature/non contiguous kv cache #513

LinHeLurking · 2024-09-29T03:33:33Z

This PR solves #506

Custom strides to support non-contiguous kv cache.
Tests in test_batch_prefill_kernels.py and test_batch_decode_kernels.py are modified to test input kv_data on both contiguous and non-contiguous tensor.

LinHeLurking · 2024-09-30T03:33:13Z

@yzh119 Please review 👀

yzh119 · 2024-09-30T05:37:28Z

Hi @LinHeLurking , I'm finalizing #507 (there are still some work to be done to keep both AOT and JIT #510 ) and I'll start working on rebasing your PR as soon as possible. Thanks for your contribution and patience!

Signed-off-by: LinHeLurking <[email protected]>

yzh119

@LinHeLurking Thanks so much for your contribution, this is indeed a very important feature to have, I rebased your code into the main branch to make it compatible with the codebase after JIT support.

I also made the following changes to make the code simpler:

Do not pass paged_kv_cache, paged_k_cache, paged_v_cache to C++ APIs, we just keep paged_k_cache, paged_v_cache in C++ API, if user provide a single paged_kv_cache, we split it at python-side.

cc @reyoung for visibility.

We introduced a bug in #513 because we didn't consider non-contiguous kv-cache for page append operator, this PR fix the bug.

…on (#561) The contiguous operation is no longer required after #513

LinHeLurking and others added 5 commits October 8, 2024 23:13

feat: non-contiguous paged kv cache

028b6fa

Signed-off-by: LinHeLurking <[email protected]>

test: add prefill/decode test for non-contiguous kv cache

303c087

Signed-off-by: LinHeLurking <[email protected]>

upd

933aa6a

upd

cdd54c6

upd

f8d7129

yzh119 force-pushed the feature/non-contiguous-kv-cache branch from 0ca5ec6 to f8d7129 Compare October 9, 2024 10:30

upd

89a2100

yzh119 approved these changes Oct 9, 2024

View reviewed changes

revert some changes on tests

02fb9ea

yzh119 merged commit 85b1878 into flashinfer-ai:main Oct 9, 2024

This was referenced Oct 9, 2024

[POC] [Do not merge] BatchPrefill without custom mask support non-cont kv-cache #508

Closed

[feature request]: Support moving num_layers into a kv cache page (or support non-contiguous kv cache) #506

Closed

bugfix: fix the stride bug in page append #527

Merged

yzh119 added a commit that referenced this pull request Oct 11, 2024

bugfix: fix the stride bug in page append (#527)

93b5d4e

We introduced a bug in #513 because we didn't consider non-contiguous kv-cache for page append operator, this PR fix the bug.

yzh119 mentioned this pull request Oct 26, 2024

perf: remove unnecessary contiguous operation in block sparse attention #561

Merged

yzh119 added a commit that referenced this pull request Oct 26, 2024

perf: remove unnecessary contiguous operation in block sparse attenti…

7a7ad46

…on (#561) The contiguous operation is no longer required after #513

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/non contiguous kv cache #513

Feature/non contiguous kv cache #513

LinHeLurking commented Sep 29, 2024

LinHeLurking commented Sep 30, 2024

yzh119 commented Sep 30, 2024 •

edited

Loading

yzh119 left a comment

Feature/non contiguous kv cache #513

Feature/non contiguous kv cache #513

Conversation

LinHeLurking commented Sep 29, 2024

LinHeLurking commented Sep 30, 2024

yzh119 commented Sep 30, 2024 • edited Loading

yzh119 left a comment

Choose a reason for hiding this comment

yzh119 commented Sep 30, 2024 •

edited

Loading