feat: support deepseek prefill attention shape #765

yzh119 · 2025-01-30T19:57:39Z

Deepseek requires head_dim_qk of 192 and head_dim_vo of 128, this PR implements this feature for prefill attention on ragged tensors. (we can also also support paged kv-cache but it's not emergent, because we only use head_dim_qk=192, head_dim_vo=128 for ragged tensor in DeepSeek MLA w/o matrix absorption, and we need another MQA kernel with head_dim_qk=576, head_dim_vo=512 for Deepseek MLA w/ matrix absorption, I'll upstream that kernel in the next PR)

Checklist

Make FA3 template compatible with deepseek model shape
Make FA2 template compatible with deepseek model shape
Fix AOT compilation scripts
Fix C++ tests/benchmarks

Changes to the programming interface

We added an optional field num_heads_vo in the plan function allowing user to specify different num_heads_qk and num_heads_vo:

wrapper.plan(
    ...
    num_heads_qk,
    num_heads_vo=num_heads_vo
    ...
)

lw921014 · 2025-02-01T11:29:04Z

Hi，I am looking forward to use this MQA kernel in deepseek V3 model. So do you have any date or plan for next pr?

Followup of #765 , fix the JIT warmup utilities functions.

#765 introduced changes to the API of `plan`, including renaming `head_dim` to `head_dim_qk` and adding `head_dim_vo`. However, some calling sites were not updated to reflect these changes, resulting in failing unit tests. This PR addresses the issue by updating the relevant calls, which should resolve the following unit test failures after merging: - `tests/test_block_sparse.py::test_block_sparse_attention` - `tests/test_non_contiguous_prefill.py::test_batch_paged_prefill_packed_input` --------- Signed-off-by: abmfy <[email protected]>

yzh119 added 21 commits January 30, 2025 06:40

wip

0e39481

upd

e7e4001

upd

3798a4f

upd

0c8dcfc

fix-aot

18bb7e6

bugfix

51ce78a

upd

4fcbbf4

upd

68fde54

upd

8497b42

upd

d785a52

upd

a4fd99f

upd

511b2e2

wip

13dc78e

upd

d1afc74

upd

45da038

upd

a5d94f3

upd

d468ca1

upd

b92aa04

fix uri matching

13ab661

upd

88e91e3

upd

0c9b618

yzh119 merged commit eb660de into main Feb 1, 2025

zhyncs deleted the deepseek-prefill branch February 1, 2025 11:32

yzh119 mentioned this pull request Feb 1, 2025

bugfix: fix the JIT warmup arguments in unittests #775

Merged

yzh119 added a commit that referenced this pull request Feb 1, 2025

bugfix: fix the JIT warmup arguments in unittests (#775)

c04755e

Followup of #765 , fix the JIT warmup utilities functions.

This was referenced Feb 3, 2025

feat: Separate QK/VO head dim dispatch for sm90 AOT #778

Merged

why flashinfer doesnt support head size dim of 56? #790

Closed

yzh119 mentioned this pull request Feb 6, 2025

[Tracing Issue] Multi-head Latent Attention #792

Closed

4 tasks

abmfy mentioned this pull request Feb 7, 2025

Fix arguments of plan for split QK/VO head dims #795

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support deepseek prefill attention shape #765

feat: support deepseek prefill attention shape #765

yzh119 commented Jan 30, 2025 •

edited

Loading

lw921014 commented Feb 1, 2025

feat: support deepseek prefill attention shape #765

feat: support deepseek prefill attention shape #765

Conversation

yzh119 commented Jan 30, 2025 • edited Loading

Checklist

Changes to the programming interface

lw921014 commented Feb 1, 2025

yzh119 commented Jan 30, 2025 •

edited

Loading