[POC] [Do not merge] BatchPrefill without custom mask support non-cont kv-cache #508

reyoung · 2024-09-25T09:16:15Z

Related issue #506

reyoung · 2024-09-26T03:13:09Z

python/csrc/batch_prefill.cu

@@ -76,10 +76,13 @@ std::vector<torch::Tensor> BatchPrefillWithPagedKVCachePyTorchWrapper::Run(
  CHECK_INPUT(q);
  CHECK_INPUT(qo_indptr);
  if (paged_kv_defined) {
-    CHECK_INPUT(paged_kv_cache.value());
+    CHECK_CUDA(paged_kv_cache.value());
+    CHECK_LAST_DIM_CONTIGUOUS(paged_kv_cache.value());


Maybe we need a new macro here, like CHECK_INPUT_LAST_DIM_CONTIGUOUS()?

@yzh119

…_kv_cache

yzh119 · 2024-10-09T11:01:02Z

Closed as #513 was merged.

BatchPrefill without custom mask support non-cont kv-cache

6e2a33a

reyoung mentioned this pull request Sep 25, 2024

[feature request]: Support moving num_layers into a kv cache page (or support non-contiguous kv cache) #506

Closed

reyoung commented Sep 26, 2024

View reviewed changes

Merge branch 'flashinfer-ai:main' into feature/support_non_contiguous…

709eb82

…_kv_cache

yzh119 closed this Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[POC] [Do not merge] BatchPrefill without custom mask support non-cont kv-cache #508

[POC] [Do not merge] BatchPrefill without custom mask support non-cont kv-cache #508

reyoung commented Sep 25, 2024 •

edited

Loading

reyoung Sep 26, 2024

yzh119 commented Oct 9, 2024

[POC] [Do not merge] BatchPrefill without custom mask support non-cont kv-cache #508

[POC] [Do not merge] BatchPrefill without custom mask support non-cont kv-cache #508

Conversation

reyoung commented Sep 25, 2024 • edited Loading

reyoung Sep 26, 2024

Choose a reason for hiding this comment

yzh119 commented Oct 9, 2024

reyoung commented Sep 25, 2024 •

edited

Loading