-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why the performance gap #2734
Comments
Looks the code path is ref:any, @oneapi-src/onednn-cpu-aarch64 Is there possible to optimize the code path on ARM platform? |
It looks like the f32 memory formats used in the problem isn't current supported in ComputeLibrary, and we currently don't support f16 convs with a f32 bias |
I explicitly specified the data format of bias, but it is still not supported. The upper one is onednn, and the lower one uses acl.
|
@Serenagirl could you please try your command |
I use
ONEDNN_VERBOSE=dispatch numactl -c 4 ./benchdnn --conv --mode=P --dt=f32 --dir=FWD_B --alg=auto --mb=1 ic64ih2560iw1440oc3oh2560ow1440kh9kw9sh1sw1dh1dw1ph4pw4
on aarch64 sve256
the single cpu:
2.9GHz*256bits/32bits*2fma*2=92.8GFlops/s
the conv:
ic64ih2560iw1440oc3oh2560ow1440kh9kw9sh1sw1dh1dw1ph4pw4--2*9*9*64*3*2560*1440=114.66GFlops
114.66Gflops/92.8Gflops/s=1.23s
,but I use onednn3.4 got 60s or more,and the performance is not good,Is there a problem in my arguments?
In addition, I cannot find an operator that can be optimized when using the ACL, and the fp16 precision is not supported, but my CPU hardware supports it.
The text was updated successfully, but these errors were encountered: