Skip to content

Commit

Permalink
Expand Qgemm UDOT kernel to 8x8 block (#8562)
Browse files Browse the repository at this point in the history
Create a new M8 loop processing A[8x8] B[8x8] per iteration.
Avoid saving registers on paths that are not needed.
Adjusted M2 and M1 loop, using more registers to relax the loop carrying dependencies.

Nearly 7% improvement observed on Surface Pro X 2 with model ssd_mobilenet_v2_300
About 4.5% improvement on resnet50 on Surface Pro X 2.
  • Loading branch information
chenfucn authored Aug 17, 2021
1 parent 871eeb4 commit 2243804
Show file tree
Hide file tree
Showing 4 changed files with 1,297 additions and 178 deletions.
Loading

0 comments on commit 2243804

Please sign in to comment.