-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PV sumPT2 of pf charged particles associated to PV in NanoAOD #43487
Conversation
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43487/38060
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43487/38061
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
Hi @JunquanTao, You need to adapt the code format as cmsbuild suggested in the above. Try to run this, Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43487/38062
|
A new Pull Request was created by @JunquanTao (JunquanTao) for master. It involves the following packages:
@vlimant, @cmsbuild, @simonepigazzini can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
enable nano |
please test |
@JunquanTao I wonder why you associate candidates to PV using dz instead of |
@mbluj As mentioned in the slides [1] (S3) when submitting the PR, we just employed the same method as what we used in Run2 analyses (for both 125 GeV Hgg and low-mass Hgg analyses), with the detailed codes in [2]. From the test, we could reproduce the log values of SumPT2 as what we used in Run2 analysis with flashgg framework. As shown in the bottom plots on S5 [1], with the variable stored in customized nanoAOD with a precision of 10 bits, the difference compared to the ones in flashgg, is less than ~0.01% relatively. We can test the performance with “vertexRef()” in the future. Do you have any suggestion on the quality requirement of PV-candidate association “pvAssociationQuality()”? Thanks! [1] https://twiki.cern.ch/twiki/pub/CMS/JunquanTao/LMHgamgam_CustomizedNanoAOD_AddingSumPT2.pdf [2] https://github.com/cms-analysis/flashgg/blob/dev_legacy_runII/MicroAOD/plugins/DzVertexMapProducer.cc#L54-L75 with “maxAllowedDz_=0.2” as specified in https://github.com/cms-analysis/flashgg/blob/dev_legacy_runII/MicroAOD/python/flashggTkVtxMap_cfi.py#L6 |
@JunquanTao OK, I see. Concerning selecting by association quality you can try two options:
|
@mbluj There should be some misunderstanding here. We do use the PV 0 directly to calculate dz [1]. We loop the rest PVs to see if this pf charged candidate belongs to other PVs (loop starting from 1 [2]), by checking its distance to any other PV “newdz” is less than “dz” or not. So, we need looping the rest vertices. “vertexRef()” with some quality requirement will not save the time. “VertexRef is PV and |dz|< 0.2” is not enough. As mentioned above, we need to make sure this pf charged candidate does not belong to other PVs, then we count it in the sum PT2. So, the current codes is good enough to calculate the variable “SumPT2’. Right? [1]
[2]
|
I understand logic of your code and I agree that is enough to reach your goal of reproducing sumPt2 score. But, iterating over candidates and summing pt2 of those which have vertexRef().key == 0 (i.e. are associated to PV=vertices[0]) with requirement that |dz(vertices[0].position())|<0.2 (to assure that dz wrt PV is always smaller than 0.2) will give very similar result. It is because of the way in which packedPFcandidates are associated to vertices. I do not force you to change your logic now, instead I want to advocate for checking (and using in future) the vertex association present in packedPFCandidates. I believe it is more robust than what is present in current code, e.g. because one can use quality flags and do not need loop over vertices to reject candidates closer in dz to other vertex than PV.
|
Hi @mbluj , I tested the codes as you proposed. As summarized on the last slide (S7) of [1], I saw different SumPT2 values from the codes you proposed compared to the codes we proposed and used in Run2 analysis, in ~2.5k events with ~5.5 total signal events tested. Based on your codes, most of the events in these ~2.5k events have large SumPT2. So I guess, the requirement in the codes as you proposed, do not guarantee that each charged pf candidate belongs to at most 1 PV. The charged pf candidates could belong to 2 or more PVs if its dz less than 0.2 wrt these PVs. So we may count the charged pf candidate, which has smaller dz to other PV compared to the dz to PV0, in this sumPT2. Then we get larger SumPT2, from the codes you proposed. I am not sure if my understanding is correct or not. Can you please clarify if each charged pf candidate belongs to at most 1 PV, or not? If not, we need to loop the rest PVs, in order to not account the charged pf candidate if its dz to other PV smaller than its dz to PV 0. This is what we did in Run2 analysis, associating a charged pf candidate to the closest vertex only. Thanks, Junquan [1] https://twiki.cern.ch/twiki/pub/CMS/JunquanTao/LMHgamgam_CustomizedNanoAOD_AddingSumPT2.pdf |
@mbluj Have you checked my previous comment and the extra studies on slide 7 [1]? Any comments or proposal? Thanks, Junquan [1] https://twiki.cern.ch/twiki/pub/CMS/JunquanTao/LMHgamgam_CustomizedNanoAOD_AddingSumPT2.pdf |
Sorry for belated answer. Thank you for the tests.
If is does not help please use your algorithm - it is not granted that you will recover old pt2 using different (in my opinion better) track to PV association. You can also consider to break backward compatibility and use new definition of pt2... Finally, you can also consider to switch from pt2 to PV score which is already stored in nanoAOD... |
Hi @mbluj , I did the check with the additional quality requirement “obj.pvAssociationQuality() >= pat::PackedCandidate::CompatibilityDz” as you suggested. As summarized on slide 8 [1], ~99% of 5.5k events have different ΣpT2, from the new codes (so-called version2) with the additional quality requirement compared to the old codes (as we used in Run2). For most of the events, we get smaller SumPT2 from the new codes (v2). So, we think it’s better to use the algorithm as in the PR.
We obtained worse performance with PV score (~7% relatively more DY events will surviving the selection) instead of SumPT2. This is the motivation or reason why we propose to add the SumPT2 in the next nanoAOD production. You can see it on slide 2 [1] or in the PR description. [1] https://twiki.cern.ch/twiki/pub/CMS/JunquanTao/LMHgamgam_CustomizedNanoAOD_AddingSumPT2.pdf |
@JunquanTao Thank you for the check. I have not more comments. |
No more comments or requests from my side, thanks! |
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b92165/36771/summary.html Comparison SummarySummary:
NANO Comparison SummarySummary:
Nano size comparison Summary:
|
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @antoniovilela, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
PR description:
This PR is to add the PV sumPT2 of pf charged particles associated to PV in NanoAOD.
The motivations and related studies ae summarized in: https://twiki.cern.ch/twiki/pub/CMS/JunquanTao/LMHgamgam_CustomizedNanoAOD_AddingSumPT2.pdf
"SumPT2" of the pf charged candidates associated to the chosen PV was used in Run2 low-mass Hgamgam analysis (HIG-20-002) to suppress the DY bkg efficiently. In current nanoAOD, “PV_score” which is the sum pt2 of clustered objects, shows worse performance (~7% relatively more DY events will surviving the selection) than “SumPT2”. So, we propose to add the PV “SumPT2” in the nanoAOD.
Since only 1 variable (float type) for each event is proposed to add, the expected additional size in nanoAOD will be 1 to 2 bytes per event, depending on the precision to store this variable (8 or 10 bits). By default we propose to store it in the precision with 10 bits.
PR validation:
runTheMatrix tests have been successfully run for the following workflow :
• runTheMatrix.py -l 12434.0
Backport:
We target this to go in Nano V13 for now. Backport might be required if the Nano production happens in the earlier CMSSW release.
Hgg convenors @lfinco , @fcouderc