-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix phi range #13914
Fix phi range #13914
Conversation
A new Pull Request was created by @VinInn (Vincenzo Innocente) for CMSSW_8_1_X. It involves the following packages: DataFormats/GeometryVector @civanch, @cvuosalo, @mdhildreth, @cmsbuild, @slava77, @davidlange6 can you please review it and eventually sign? Thanks. cms-bot commands are list here #13028 |
@cmsbuild , please test |
The tests are being triggered in jenkins. |
+1 |
} | ||
|
||
bool inside(float ix, float iy) const { | ||
auto norm = 1.f/std::sqrt(ix*ix+iy*iy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to protect against division by zero here and at line 20 in case these methods are called inappropriate zero values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. too expensive.
better people verify input if in doubt
we should stop to be iper protective and than ask why we spend most of reco time in branches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is 1./sqrt needed here?
isn't it cheaper to return ix*x + iy*y > dcos*sqrt(ix*ix + iy*iy);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends...
with Ofast 1.f/std::sqrt is the cheaper than sqrt in float...
maybe you are right: most probably even with Ofast
return ix_x + iy_y > dcos_sqrt(ix_ix + iy*iy);
has one multiplication less (even without fma)
and parallelize more (less dependencies)
Pull request #13914 was updated. @civanch, @cvuosalo, @mdhildreth, @cmsbuild, @slava77, @davidlange6 can you please check and sign again. |
return inside(v.x(),v.y()); | ||
} | ||
|
||
bool inside(float ix, float iy) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include<cmath>
float x,y,dcos;
bool inside(float ix, float iy) {
return ix*x+iy*y > dcos*std::sqrt(ix*ix+iy*iy);
}
bool inside2(float ix, float iy) {
auto norm = 1.f/std::sqrt(ix*ix+iy*iy);
return norm*(ix*x+iy*y) > dcos;
}
produces with Ofast
the real point is not the number of mul instruction, but how the out-of-order processor can feed the two FP engines and how the mul are pipelined : I do not think we can easily compute the latency of such a small kernel...
inside(float, float): # @inside(float, float)
movss x(%rip), %xmm2 # xmm2 = mem[0],zero,zero,zero
mulss %xmm0, %xmm2
movss y(%rip), %xmm3 # xmm3 = mem[0],zero,zero,zero
mulss %xmm1, %xmm3
addss %xmm2, %xmm3
mulss %xmm0, %xmm0
mulss %xmm1, %xmm1
addss %xmm0, %xmm1
xorps %xmm0, %xmm0
rsqrtss %xmm1, %xmm0
movaps %xmm0, %xmm2
mulss %xmm2, %xmm2
mulss %xmm1, %xmm2
addss .LCPI2_0(%rip), %xmm2
mulss .LCPI2_1(%rip), %xmm0
mulss %xmm1, %xmm0
mulss %xmm2, %xmm0
xorps %xmm2, %xmm2
cmpeqss %xmm2, %xmm1
andnps %xmm0, %xmm1
mulss dcos(%rip), %xmm1
ucomiss %xmm1, %xmm3
seta %al
retq
inside2(float, float): # @inside2(float, float)
movaps %xmm0, %xmm2
mulss %xmm2, %xmm2
movaps %xmm1, %xmm3
mulss %xmm3, %xmm3
addss %xmm2, %xmm3
xorps %xmm2, %xmm2
rsqrtss %xmm3, %xmm2
movaps %xmm2, %xmm4
mulss %xmm4, %xmm4
mulss %xmm3, %xmm4
addss .LCPI3_0(%rip), %xmm4
mulss .LCPI3_1(%rip), %xmm2
mulss x(%rip), %xmm0
mulss y(%rip), %xmm1
addss %xmm0, %xmm1
mulss %xmm2, %xmm1
mulss %xmm4, %xmm1
ucomiss dcos(%rip), %xmm1
seta %al
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw that was clang
gcc (even 6.0) produces
inside(float, float):
movaps %xmm0, %xmm2
movaps %xmm1, %xmm3
mulss %xmm0, %xmm2
mulss %xmm1, %xmm3
mulss x(%rip), %xmm0
mulss y(%rip), %xmm1
addss %xmm3, %xmm2
sqrtss %xmm2, %xmm2
mulss dcos(%rip), %xmm2
addss %xmm1, %xmm0
comiss %xmm2, %xmm0
seta %al
ret
one needs to add ``-mrecip` to get a code similar to that by clang
There is a long argument if scalar sqrt is really slower than rsqrt +NR
again much depends of the processor ability to pipeline and parallelize fp instructions...
given that we are even questioning if Ofast is worth, the whole discussion is a bit lame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my side, the discussion was actually triggered by the case of division by 0.
The rhs dcos*sqrt doesn't have x/0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so now we have 0>0 instead of inf>dcos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0>0 is not an FPE.
most probably this is a bit faster even with Ofast or at least not slower (see comment in PR)
Pull request #13914 was updated. @civanch, @cvuosalo, @mdhildreth, @cmsbuild, @slava77, @davidlange6 can you please check and sign again. |
@cmsbuild, please test |
The tests are being triggered in jenkins. |
-1 runTheMatrix-results/135.4_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS/step1_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS.log you can see the results of the tests here: |
interesting
hard to believe that it is due to this PR... |
It's not, it's broken in IBs. |
actually this workflow is not broken in the IBs.. but indeed this error comes up in one workflow, so its presumably related -the L1 team is investigating.
|
+1 Bug fix in checking phi ranges for pixel hits. The code changes are satisfactory. Jenkins tests against baseline CMSSW_8_1_X_2016-04-11-2300 show numerous tiny differences, but they are not significant. The Jenkins test error is not related to this PR. Extended tests of workflow 25202.0_TTbar_13 with 70 events and workflows 1317.0_SingleElectronPt35_UP15 and 1318.0_SingleGammaPt10_UP15 with 1000 events each against baseline CMSSW_8_1_X_2016-04-03-1100 also show numerous tiny differences, but again they are not significant. |
Simulation signed a previous version. the relval failure is not related to this PR |
there was a bug in checkPhiInRange and other in PixelTripletHLTGenerator.cc (accepting empty ranges).
Added also a test and two optimized implementations of phi and eta intervals.
regression expected in particular at low-pt
http://innocent.home.cern.ch/innocent/RelVal/pu35_81_fixRP/plots_highPurity/effandfake1.pdf