Skip to content

๐Ÿ“šFFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2xโ†‘๐ŸŽ‰vs SDPA EA.

License

Notifications You must be signed in to change notification settings

DefTruth/ffpa-attn-mma

 
 

Repository files navigation

About

๐Ÿ“šFFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2xโ†‘๐ŸŽ‰vs SDPA EA.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Cuda 78.2%
  • Python 20.5%
  • Shell 1.1%
  • C++ 0.2%