Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix phi range #13914

Merged
merged 8 commits into from
Apr 14, 2016
Merged

Fix phi range #13914

merged 8 commits into from
Apr 14, 2016

Conversation

VinInn
Copy link
Contributor

@VinInn VinInn commented Apr 4, 2016

there was a bug in checkPhiInRange and other in PixelTripletHLTGenerator.cc (accepting empty ranges).
Added also a test and two optimized implementations of phi and eta intervals.

regression expected in particular at low-pt
http://innocent.home.cern.ch/innocent/RelVal/pu35_81_fixRP/plots_highPurity/effandfake1.pdf

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2016

A new Pull Request was created by @VinInn (Vincenzo Innocente) for CMSSW_8_1_X.

It involves the following packages:

DataFormats/GeometryVector
DataFormats/Math
RecoPixelVertexing/PixelTriplets

@civanch, @cvuosalo, @mdhildreth, @cmsbuild, @slava77, @davidlange6 can you please review it and eventually sign? Thanks.
@makortel, @cerati, @GiacomoSguazzoni, @dgulhan, @rovere this is something you requested to watch as well.
@slava77, @Degano, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

@VinInn
Copy link
Contributor Author

VinInn commented Apr 4, 2016

@cmsbuild , please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/12157/console

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2016

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2016

@civanch
Copy link
Contributor

civanch commented Apr 4, 2016

+1

}

bool inside(float ix, float iy) const {
auto norm = 1.f/std::sqrt(ix*ix+iy*iy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to protect against division by zero here and at line 20 in case these methods are called inappropriate zero values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. too expensive.
better people verify input if in doubt
we should stop to be iper protective and than ask why we spend most of reco time in branches

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is 1./sqrt needed here?
isn't it cheaper to return ix*x + iy*y > dcos*sqrt(ix*ix + iy*iy);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends...
with Ofast 1.f/std::sqrt is the cheaper than sqrt in float...
maybe you are right: most probably even with Ofast
return ix_x + iy_y > dcos_sqrt(ix_ix + iy*iy);
has one multiplication less (even without fma)
and parallelize more (less dependencies)

@cmsbuild
Copy link
Contributor

Pull request #13914 was updated. @civanch, @cvuosalo, @mdhildreth, @cmsbuild, @slava77, @davidlange6 can you please check and sign again.

return inside(v.x(),v.y());
}

bool inside(float ix, float iy) const {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include<cmath>
float x,y,dcos;
bool inside(float ix, float iy) {
    return ix*x+iy*y > dcos*std::sqrt(ix*ix+iy*iy);
  }

bool inside2(float ix, float iy) {
  auto norm = 1.f/std::sqrt(ix*ix+iy*iy);
    return norm*(ix*x+iy*y) > dcos;
  }

produces with Ofast
the real point is not the number of mul instruction, but how the out-of-order processor can feed the two FP engines and how the mul are pipelined : I do not think we can easily compute the latency of such a small kernel...

inside(float, float):                            # @inside(float, float)
        movss   x(%rip), %xmm2          # xmm2 = mem[0],zero,zero,zero
        mulss   %xmm0, %xmm2
        movss   y(%rip), %xmm3          # xmm3 = mem[0],zero,zero,zero
        mulss   %xmm1, %xmm3
        addss   %xmm2, %xmm3
        mulss   %xmm0, %xmm0
        mulss   %xmm1, %xmm1
        addss   %xmm0, %xmm1
        xorps   %xmm0, %xmm0
        rsqrtss %xmm1, %xmm0
        movaps  %xmm0, %xmm2
        mulss   %xmm2, %xmm2
        mulss   %xmm1, %xmm2
        addss   .LCPI2_0(%rip), %xmm2
        mulss   .LCPI2_1(%rip), %xmm0
        mulss   %xmm1, %xmm0
        mulss   %xmm2, %xmm0
        xorps   %xmm2, %xmm2
        cmpeqss %xmm2, %xmm1
        andnps  %xmm0, %xmm1
        mulss   dcos(%rip), %xmm1
        ucomiss %xmm1, %xmm3
        seta    %al
        retq

inside2(float, float):                           # @inside2(float, float)
        movaps  %xmm0, %xmm2
        mulss   %xmm2, %xmm2
        movaps  %xmm1, %xmm3
        mulss   %xmm3, %xmm3
        addss   %xmm2, %xmm3
        xorps   %xmm2, %xmm2
        rsqrtss %xmm3, %xmm2
        movaps  %xmm2, %xmm4
        mulss   %xmm4, %xmm4
        mulss   %xmm3, %xmm4
        addss   .LCPI3_0(%rip), %xmm4
        mulss   .LCPI3_1(%rip), %xmm2
        mulss   x(%rip), %xmm0
        mulss   y(%rip), %xmm1
        addss   %xmm0, %xmm1
        mulss   %xmm2, %xmm1
        mulss   %xmm4, %xmm1
        ucomiss dcos(%rip), %xmm1
        seta    %al

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw that was clang
gcc (even 6.0) produces

inside(float, float):
        movaps  %xmm0, %xmm2
        movaps  %xmm1, %xmm3
        mulss   %xmm0, %xmm2
        mulss   %xmm1, %xmm3
        mulss   x(%rip), %xmm0
        mulss   y(%rip), %xmm1
        addss   %xmm3, %xmm2
        sqrtss  %xmm2, %xmm2
        mulss   dcos(%rip), %xmm2
        addss   %xmm1, %xmm0
        comiss  %xmm2, %xmm0
        seta    %al
        ret

one needs to add ``-mrecip` to get a code similar to that by clang
There is a long argument if scalar sqrt is really slower than rsqrt +NR
again much depends of the processor ability to pipeline and parallelize fp instructions...
given that we are even questioning if Ofast is worth, the whole discussion is a bit lame

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my side, the discussion was actually triggered by the case of division by 0.
The rhs dcos*sqrt doesn't have x/0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so now we have 0>0 instead of inf>dcos

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0>0 is not an FPE.

@cmsbuild
Copy link
Contributor

Pull request #13914 was updated. @civanch, @cvuosalo, @mdhildreth, @cmsbuild, @slava77, @davidlange6 can you please check and sign again.

@VinInn
Copy link
Contributor Author

VinInn commented Apr 12, 2016

@cmsbuild, please test
the problem was with the test, the two algo may disagree fi the tested value is (inside tolerances) close to one extreme (was 0 and -0.1e-6)

@cmsbuild
Copy link
Contributor

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/12326/console

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

-1
Tested at: f8cf2ad
When I ran the RelVals I found an error in the following worklfows:
135.4 step1

runTheMatrix-results/135.4_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS/step1_ZEE_13+ZEEFS_13+HARVESTUP15FS+MINIAODMCUP15FS.log

you can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-13914/12326/summary.html
'List

@VinInn
Copy link
Contributor Author

VinInn commented Apr 13, 2016

interesting

Thread 2 (Thread 0x7fa3dcfbf700 (LWP 5572)):
#0  0x000000313bc0f37d in waitpid () from /lib64/libpthread.so.0
#1  0x00007fa3eba1a537 in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms-ib.cern.ch/2016-16/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_X_2016-04-10-0000/lib/slc6_amd64_gcc530/pluginFWCoreServicesPlugins.so
#2  0x00007fa3eba1ade2 in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/2016-16/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_X_2016-04-10-0000/lib/slc6_amd64_gcc530/pluginFWCoreServicesPlugins.so
#3  0x00007fa3ee614a50 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
#4  0x000000313bc07aa1 in start_thread () from /lib64/libpthread.so.0
#5  0x000000313b8e893d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fa3ee2b0ba0 (LWP 5339)):
#0  0x000000313b8df113 in poll () from /lib64/libc.so.6
#1  0x00007fa3eba1ac44 in full_read.constprop () from /cvmfs/cms-ib.cern.ch/2016-16/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_X_2016-04-10-0000/lib/slc6_amd64_gcc530/pluginFWCoreServicesPlugins.so
#2  0x00007fa3eba1aeaa in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/2016-16/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_X_2016-04-10-0000/lib/slc6_amd64_gcc530/pluginFWCoreServicesPlugins.so
#3  0x00007fa3eba1b02b in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/2016-16/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_X_2016-04-10-0000/lib/slc6_amd64_gcc530/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007fa3d47d7533 in DetectPatterns(ZonesOutput) () from /cvmfs/cms-ib.cern.ch/2016-16/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_X_2016-04-10-0000/lib/slc6_amd64_gcc530/libL1TriggerL1TMuonEndCap.so
#6  0x00007fa3d47d809d in Patterns(std::vector<ZonesOutput, std::allocator<ZonesOutput> >) () from /cvmfs/cms-ib.cern.ch/2016-16/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_X_2016-04-10-0000/lib/slc6_amd64_gcc530/libL1TriggerL1TMuonEndCap.so
#7  0x00007fa39818145f in L1TMuonEndCapTrackProducer::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/2016-16/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_X_2016-04-10-0000/lib/slc6_amd64_gcc530/pluginL1TriggerL1TMuonEndCapPlugins.so
#8  0x00007fa3efb79771 in edm::EDProducer::doEvent(edm::EventPrincipal const&, edm::EventSetup const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/2016-16/slc6_amd64_gcc530/cms/cmssw/CMSSW_8_1_X_2016-04-10-0000/lib/slc6_amd64_gcc530/libFWCoreFramework.so
#9  0x00007fa3efc1a52f in edm::WorkerT<edm::EDProducer>::implDo(edm::EventPrincipal const&, edm::EventSetup const&, edm::ModuleCa

hard to believe that it is due to this PR...

@davidlt
Copy link
Contributor

davidlt commented Apr 13, 2016

It's not, it's broken in IBs.

@davidlange6
Copy link
Contributor

actually this workflow is not broken in the IBs.. but indeed this error comes up in one workflow, so its presumably related -the L1 team is investigating.

On Apr 13, 2016, at 9:25 AM, davidlt [email protected] wrote:

It's not, it's broken in IBs.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@cvuosalo
Copy link
Contributor

+1

For #13914 f8cf2ad

Bug fix in checking phi ranges for pixel hits.

The code changes are satisfactory. Jenkins tests against baseline CMSSW_8_1_X_2016-04-11-2300 show numerous tiny differences, but they are not significant. The Jenkins test error is not related to this PR.

Extended tests of workflow 25202.0_TTbar_13 with 70 events and workflows 1317.0_SingleElectronPt35_UP15 and 1318.0_SingleGammaPt10_UP15 with 1000 events each against baseline CMSSW_8_1_X_2016-04-03-1100 also show numerous tiny differences, but again they are not significant.

@VinInn
Copy link
Contributor Author

VinInn commented Apr 14, 2016

Simulation signed a previous version. the relval failure is not related to this PR

@davidlange6 davidlange6 merged commit 6bf9fcb into cms-sw:CMSSW_8_1_X Apr 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants