What's Changed
- fix cu121 torch2.6 by @zhyncs in #867
- unittest: add MLA test cases where kv_len is evenly divided by page_size. by @foreverlms in #861
- bugfix: fix the behavior of MLA kernel when kv-length is 0 by @yzh119 in #868
- Merge of previous PRs for typos in a single one. As per your request. by @didier-durand in #862
- add lightllm adoption by @zhyncs in #871
- fix geneate_dispatch_inc args from parser by @baowendin in #870
- [API] Fix top_k_top_p_sampling_from_logits param typo by @kasohrab in #875
- misc:Remove unused k_smem_offset_w update in MLA kernel by @muoshuosha in #878
- JIT compilation support for TVM by @MasterJH5574 in #880
- [Hotfix] Add flashinfer.jit.attention into packages by @zhouye in #881
- perf: FlashAttention-3 style MLA PageAttention by @yzh119 in #887
- [JIT] Fix MLA header in TVM binding by @MasterJH5574 in #889
- Fixing several typos in doc file kv_layout.rst by @didier-durand in #884
- unittest: add unittests for MLA + cudagraph by @yzh119 in #890
New Contributors
- @baowendin made their first contribution in #870
- @kasohrab made their first contribution in #875
- @zhouye made their first contribution in #881
Full Changelog: v0.2.1.post2...v0.2.2