why flashinfer doesnt support head size dim of 56? #790

meowcoder22 · 2025-02-06T13:29:30Z

as you know, 56 is now common head size dim with deepseek

why it doesnt support, only power of 2?

how can we fix it. is there option to pad head size?

please let me know, is urgent and must support head dim of size 56 for batch prefill qkv paged kv cache.

please give some guidance to pad head size 56 to head size 64 and receive identical result.

thanks.

yzh119 · 2025-02-06T14:23:53Z

as you know, 56 is now common head size dim with deepseek

I'm not aware of that, I suppose the head_dim for deepseek is 576 for qk and 512 for vo?

meowcoder22 · 2025-02-06T17:11:17Z

for hidden size 7168 and num_attention_heads 128, is 7168/128 = 56

anyway, is it possible to pad head size to work?

abcdabcd987 · 2025-02-06T18:35:52Z

MLA is very different from MHA/MQA/GQA. You can refer to the figure in #551 to help understand.

Without matrix absorption, qk head dim is 192 (qk_nope_head_dim + qk_rope_head_dim) and v head dim is 128 (v_head_dim).

With matrix absorption, qk head dim is 576 (kv_lora_rank + qk_rope_head_dim) and v head dim is 512 (kv_lora_rank).

FlashInfer community is actively working on MLA support:

feat: support MLA decode #551 added initial support for MLA decode.
feat: support deepseek prefill attention shape #765 added (192,128) head dim support for ragged prefill
@yzh119 and @tsu-bin are working on more improvements.

abcdabcd987 closed this as completed Feb 6, 2025

Provide feedback