-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/non contiguous kv cache #513
Feature/non contiguous kv cache #513
Conversation
@yzh119 Please review 👀 |
Hi @LinHeLurking , I'm finalizing #507 (there are still some work to be done to keep both AOT and JIT #510 ) and I'll start working on rebasing your PR as soon as possible. Thanks for your contribution and patience! |
Signed-off-by: LinHeLurking <[email protected]>
Signed-off-by: LinHeLurking <[email protected]>
0ca5ec6
to
f8d7129
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LinHeLurking Thanks so much for your contribution, this is indeed a very important feature to have, I rebased your code into the main branch to make it compatible with the codebase after JIT support.
I also made the following changes to make the code simpler:
- Do not pass
paged_kv_cache
,paged_k_cache
,paged_v_cache
to C++ APIs, we just keeppaged_k_cache
,paged_v_cache
in C++ API, if user provide a single paged_kv_cache, we split it at python-side.
cc @reyoung for visibility.
We introduced a bug in #513 because we didn't consider non-contiguous kv-cache for page append operator, this PR fix the bug.
This PR solves #506
Custom strides to support non-contiguous kv cache.
Tests in
test_batch_prefill_kernels.py
andtest_batch_decode_kernels.py
are modified to test input kv_data on both contiguous and non-contiguous tensor.