Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why flashinfer doesnt support head size dim of 56? #790

Closed
meowcoder22 opened this issue Feb 6, 2025 · 3 comments
Closed

why flashinfer doesnt support head size dim of 56? #790

meowcoder22 opened this issue Feb 6, 2025 · 3 comments

Comments

@meowcoder22
Copy link

as you know, 56 is now common head size dim with deepseek

why it doesnt support, only power of 2?

how can we fix it. is there option to pad head size?

please let me know, is urgent and must support head dim of size 56 for batch prefill qkv paged kv cache.

@Qubitium @reyoung @nandor @masahi @LLLLKKKK

please give some guidance to pad head size 56 to head size 64 and receive identical result.

thanks.

@yzh119
Copy link
Collaborator

yzh119 commented Feb 6, 2025

as you know, 56 is now common head size dim with deepseek

I'm not aware of that, I suppose the head_dim for deepseek is 576 for qk and 512 for vo?

@meowcoder22
Copy link
Author

hi @yzh119

for hidden size 7168 and num_attention_heads 128, is 7168/128 = 56

anyway, is it possible to pad head size to work?

@abcdabcd987
Copy link
Member

Hi @meowcoder22

MLA is very different from MHA/MQA/GQA. You can refer to the figure in #551 to help understand.

Without matrix absorption, qk head dim is 192 (qk_nope_head_dim + qk_rope_head_dim) and v head dim is 128 (v_head_dim).

With matrix absorption, qk head dim is 576 (kv_lora_rank + qk_rope_head_dim) and v head dim is 512 (kv_lora_rank).

FlashInfer community is actively working on MLA support:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants