We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
as you know, 56 is now common head size dim with deepseek
why it doesnt support, only power of 2?
how can we fix it. is there option to pad head size?
please let me know, is urgent and must support head dim of size 56 for batch prefill qkv paged kv cache.
@Qubitium @reyoung @nandor @masahi @LLLLKKKK
please give some guidance to pad head size 56 to head size 64 and receive identical result.
thanks.
The text was updated successfully, but these errors were encountered:
I'm not aware of that, I suppose the head_dim for deepseek is 576 for qk and 512 for vo?
576
512
Sorry, something went wrong.
hi @yzh119
for hidden size 7168 and num_attention_heads 128, is 7168/128 = 56
anyway, is it possible to pad head size to work?
Hi @meowcoder22
MLA is very different from MHA/MQA/GQA. You can refer to the figure in #551 to help understand.
Without matrix absorption, qk head dim is 192 (qk_nope_head_dim + qk_rope_head_dim) and v head dim is 128 (v_head_dim).
qk_nope_head_dim + qk_rope_head_dim
v_head_dim
With matrix absorption, qk head dim is 576 (kv_lora_rank + qk_rope_head_dim) and v head dim is 512 (kv_lora_rank).
kv_lora_rank + qk_rope_head_dim
kv_lora_rank
FlashInfer community is actively working on MLA support:
(192,128)
No branches or pull requests
as you know, 56 is now common head size dim with deepseek
why it doesnt support, only power of 2?
how can we fix it. is there option to pad head size?
please let me know, is urgent and must support head dim of size 56 for batch prefill qkv paged kv cache.
@Qubitium @reyoung @nandor @masahi @LLLLKKKK
please give some guidance to pad head size 56 to head size 64 and receive identical result.
thanks.
The text was updated successfully, but these errors were encountered: