-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replay failures after #41198 introduced #41246
Comments
urgent |
A new Issue was created by @rappoccio . @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign @dinyar, @aloeliger, @cms-sw/l1-l2 |
@rappoccio is the replay starting from a ROOT file or from a Streamer file? If the latter, than the there was a schema change to a data product and Streamers do not support such a change. |
assign l1 |
New categories assigned: l1 @epalencia,@aloeliger,@cecilecaillol you have been requested to review this Pull request/Issue and eventually sign? Thanks |
@Dr15Jones that's a good question. I will ask. |
Looking at the blame log for the class in question it clearly shows that an additional member data of type bool was added to the class. That would account for why it tried to read 1 byte too many. |
@Dr15Jones Thanks for the quick check on that. Do these changes need to be reverted? This is part of a large series of changes introduced by @eyigitba @elfontan and @dinyar for muon shower triggering at L1. |
Depends on what you want to do. Streamer files can only safely be read by the exact same version of CMSSW as was used to create them. That does not mean you can't change the online format, it only means you can't read older files after such a change. That means changes to data products used online must be coordinated between HLT and T0. |
On Mar 31, 2023, at 3:32 PM, Chris Jones ***@***.***> wrote:
Do these changes need to be reverted?
Depends on what you want to do. Streamer files can only safely be read by the exact same version of CMSSW as was used to create them. That does not mean you can't change the online format, it only means you can't read older files after such a change. That means changes to data products used online must be coordinated between HLT and T0.
Or that the tier0 should use the release used by hlt for any job that takes streamers as input (which is the old policy..)
… —
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Sorry, for my edification, is there an implied mismatch in CMSSW version between HLT and tier 0 right now? |
No, this is occurring in a replay. We cannot deploy 13_0_2 without a successful test at the T0. |
@rappoccio Could you point to the full details of the replay? I'm puzzled why this particular data format class is causing trouble (it is not part of RAW nor Scouting, as far as I can tell) |
For the record, the culprit PR in 13_0_X is #41198 |
Okay. Let me try to work this from the L1 side a bit and see if I can't locate Dinyar. |
Hi, Right, so from what I read in the thread this is expected for streamer files if a data member is added (though if I understood @makortel correctly it might actually not be entirely expected). The added data member is required for a new type of muon shower that some people would like to trigger on. If there's a way to add this without causing these issues I'm happy to implement them, but from my limited understanding also iorules wouldn't help here, correct? Cheers, |
Correct, iorules would not work here as Streamer files can not make use of any scheme evaluation (either the automatic type nor the iorules) from ROOT. |
@dinyar Could you elaborate where the |
Sure, we certainly need it in the DQM, similar for the HLT (the showers are used as an input to the uGT emulator which runs as part of the HLT). I'm not really sure whether it's used in prompt reco, but I think that it's persisted in AOD. The main use case from my perspective is in any case DQM and HLT, but I e.g. don't know whether HLT creates them and delivers them to T0 or whether they're unpacked from RAW at T0 a second time (I don't have much knowledge of how things work at T0 unfortunately). |
Sorry. I don't agree. Failures for streamer incompatibility are ok and should be worked around in the tier0 tests
On Mar 31, 2023 3:36 PM, rappoccio ***@***.***> wrote:
No, this is occurring in a replay. We cannot deploy 13_0_2 without a successful test at the T0.
—
Reply to this email directly, view it on GitHub<#41246 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABGPFQ62JYZAPORKLYPW3NTW63MWHANCNFSM6AAAAAAWOUXWUQ>.
You are receiving this because you commented.Message ID: ***@***.***>
|
I was replying along the same line. This is not the first time Tier0 replay fails because of data format changes, e.g. every change in Scouting data formats makes the replay (that uses Scouting streamer files) to fail. We have worked around those in the past. Is something preventing that now? |
I agree with @davidlange6 |
I've discussed with T0, we had a bit of a miscommunication. They used 13_0_0 ONLY for the repacker, not the full replay. The jobs are subsequently passing, so this should be fine. I will close the issue. Apologies for the miscommunication, it looks like we're okay now. |
The problem affects all releases after (and including) 13_0_0_pre3 (where we updated ROOT to 6.26, also the release the file read in #41246 (comment) was produced with), and unfortunately that includes 13_0_X production releases. @smuzaffar (@cms-sw/externals-l2) we'd need to update ROOT at minimum to include root-project/root#11446, or to v6-26-08 or later in both 13_0_X and 13_1_X as soon as possible |
@makortel , we are already using latest root 6-26-10 + fixes from the patches branch in 13.1.X IBs. The only changes we are missing are the 3 commits pushed today. If we see this issue in default 13.1.X IBs then may be root-project/root#11446 was not backported to 6.26 patches branch? |
I see that root 6.26 patches branch is missing the fix in https://github.com/root-project/root/blob/v6-26-00-patches/io/io/src/TStreamerInfo.cxx . @pcanal , can you please confirm that root-project/root#11446 was not back ported to root 6.26 ? |
Ah, good point, I was sloppy when checking the ROOT versions. But 13_0_0 still uses 6.26.07 (plus the patches), right? |
yes, 13.0.X is using old version. |
sorry my bad root-project/root#11446 is part of root 6.26 patches branch https://github.com/root-project/root/commits/v6-26-00-patches/io/io/src/TStreamerInfo.cxx . So basically we just need to backport the root from 13.1.X in to 13.0.X |
Right |
cms-sw/cmsdist#8421 backports root 6.26 changes from CMSSW_13_1_X to CMSSW_13_0_X |
Hi @smuzaffar, but the problem reported by Marco in #41246 (comment) (and reproducible with Matti's recipe in #41246 (comment)) is also showing up in 13_1_X IB (also in the ROOT6 IBs)...what am I missing? |
@francescobrivio , if I understand correctly the #41246 (comment) comment then the problem was how the file was writing. I think it was written by root version without the fix. Including newer root means new files will be written using fixed version of root. |
With scanfile.C.txt (which of course can be renamed :))
To get a list of the class (names) that do not have a |
@smuzaffar @makortel if I understand it correctly with the merge of cms-sw/cmsdist#8421 in CMSSW_13_0_X this issue can be considered solved, and therefore closed here in github (unless you want to keep it open for a follow up). Can you please confirm? |
@perrotta The immediate issue seems to be resolved, but there will be some longer-term follow ups that would be useful to record in GitHub. I could open a new issue for those though. |
I opened a new issue #41348 where I'll collect more information of where we are or have been missing StreamerInfo. Given that the most urgent issue was solved, I think we can now close this issue. |
+core |
@cmsbuild, please close |
There are currently T0 replay failures that have begun in 13_0_2:
This is likely caused by the introduction of #41198
@dinyar please take a look, this is very urgent.
The text was updated successfully, but these errors were encountered: