-
Notifications
You must be signed in to change notification settings - Fork 48
Single Precision Matrix Performance
In the branch mixed_cg I have added a mixed single/double precision solver for the light doublet (no clover term so far, but to come...). The solver can be invoked by specifying
Solver = mixedcg
in the operator section and by setting:
UseSloppyPrecision = yes
I have seen speedups of the solver of about 30% compared to cg both on scalar and MPI setups. The single precision matrix is only improved for BG/Q by using intrinsics so far. I will be working on an AVX version in the next time.
The optimized BG/Q version that includes the overlapping of computation and communication from the InterleavedNDTwistedClover branch is available in my branch interleaved_mixed_cg. It uses OMP orphaning for the 32 bit Matrix. The speedup depends on local lattice size as can be seen from this figure: performance