Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] support W8A8(FP8) and KV Cache FP8 for DeepSeek V2 #1156

Closed
2 tasks done
zhyncs opened this issue Aug 19, 2024 · 3 comments
Closed
2 tasks done

[Feature] support W8A8(FP8) and KV Cache FP8 for DeepSeek V2 #1156

zhyncs opened this issue Aug 19, 2024 · 3 comments
Assignees
Labels

Comments

@zhyncs
Copy link
Member

zhyncs commented Aug 19, 2024

Checklist

Motivation

As titled. Make DeepSeek V2 MLA Faster!

Related resources

No response

@fengyang95
Copy link

Is there a specific timeline for this?

@zhyncs
Copy link
Member Author

zhyncs commented Aug 30, 2024

Is there a specific timeline for this?

bmm fp8 has been implemented with flashinfer-ai/flashinfer#469
fp8 e5m2 kv cache has been implemented with #1204

Currently, there is no adaptation for DeepSeek V2 as we are focusing on other higher priority tasks. Expected to be completed within these few days.

@zhyncs
Copy link
Member Author

zhyncs commented Sep 1, 2024

done

@zhyncs zhyncs closed this as completed Sep 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants