Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/non contiguous kv cache #513

Merged

Conversation

LinHeLurking
Copy link
Contributor

This PR solves #506

Custom strides to support non-contiguous kv cache.
Tests in test_batch_prefill_kernels.py and test_batch_decode_kernels.py are modified to test input kv_data on both contiguous and non-contiguous tensor.

@LinHeLurking
Copy link
Contributor Author

@yzh119 Please review 👀

@yzh119
Copy link
Collaborator

yzh119 commented Sep 30, 2024

Hi @LinHeLurking , I'm finalizing #507 (there are still some work to be done to keep both AOT and JIT #510 ) and I'll start working on rebasing your PR as soon as possible. Thanks for your contribution and patience!

@yzh119 yzh119 force-pushed the feature/non-contiguous-kv-cache branch from 0ca5ec6 to f8d7129 Compare October 9, 2024 10:30
Copy link
Collaborator

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinHeLurking Thanks so much for your contribution, this is indeed a very important feature to have, I rebased your code into the main branch to make it compatible with the codebase after JIT support.

I also made the following changes to make the code simpler:

  1. Do not pass paged_kv_cache, paged_k_cache, paged_v_cache to C++ APIs, we just keep paged_k_cache, paged_v_cache in C++ API, if user provide a single paged_kv_cache, we split it at python-side.

cc @reyoung for visibility.

@yzh119 yzh119 merged commit 85b1878 into flashinfer-ai:main Oct 9, 2024
yzh119 added a commit that referenced this pull request Oct 11, 2024
We introduced a bug in #513 because we didn't consider non-contiguous
kv-cache for page append operator, this PR fix the bug.
yzh119 added a commit that referenced this pull request Oct 26, 2024
…on (#561)

The contiguous operation is no longer required after #513
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants