-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow 11834.24 timeouts on cs8 #36492
Comments
A new Issue was created by @makortel Matti Kortelainen. @Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
@smuzaffar, I noticed the cmsDriver commands on |
@makortel , ah good catch. |
CMSSW_12_3_X_2021-12-13-2300 cs8_amd64_gcc11 shows similar time/event behavior
In the same IB slc7_amd64_gcc10 (but this is with 4 threads and streams)
Based on a few first events the per-event processing times are on the same order of magnitude, but not exactly the same. |
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
@cms-sw/reconstruction-l2 11834.24 appears to be 0T, is 10-30 minutes per event. Just to confirm, is this as expected? |
The no-pileup variant 11634.24 appears to process only 1 event, I wonder if that could be sufficient for the pileup variant too 11634.24 appears to include also ALCA step that 11834.24 does not (@cms-sw/pdmv-l2 @cms-sw/alca-l2), I have no idea if that is intentional or not. |
the startup times don't look like 4 stream, more like 2-stream. OTOH https://cmssdt.cern.ch/SDT/cgi-bin/buildlogs/raw/slc7_amd64_gcc900/CMSSW_12_3_X_2021-12-13-2300/pyRelValMatrixLogs/run/11834.24_TTbar_14TeV+2021PU_0T+TTbar_14TeV_TuneCP5_GenSimINPUT+DigiPU+RecoNanoPU+HARVESTNanoPU/step3_TTbar_14TeV+2021PU_0T+TTbar_14TeV_TuneCP5_GenSimINPUT+DigiPU+RecoNanoPU+HARVESTNanoPU.log
In the slc7_amd64_gcc900/CMSSW_12_3_X_2021-12-13-2300 log the job starts processing at @cms-sw/tracking-pog-l2 please take a note Is |
11834.24 has the NANO step as well (while 11634.24 does not), I'm not sure if that matters or not. |
when they were first introduced at #30527 these wfs were designed to run just 1 event: and actually it seems that dumping the command both start with 1 GS event: 11834.24 2021PU_0T+TTbar_14TeV_TuneCP5_GenSim+DigiPU+RecoPU+HARVESTPU+Nano [1]: cmsDriver.py TTbar_14TeV_TuneCP5_cfi -s GEN,SIM -n 1 --conditions auto:phase1_2021_realistic_0T --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --datatier GEN-SIM --eventcontent FEVTDEBUG --geometry DB:Extended --era Run3 --magField 0T --relval 9000,100
[2]: cmsDriver.py step2 -s DIGI:pdigi_valid,L1,DIGI2RAW,HLT:@relval2021 --conditions auto:phase1_2021_realistic_0T --datatier GEN-SIM-DIGI-RAW -n 10 --eventcontent FEVTDEBUGHLT --geometry DB:Extended --era Run3 --magField 0T --pileup Run3_Flat55To75_PoissonOOTPU --pileup_input das:/RelValMinBias_14TeV/CMSSW_12_0_0_pre4-120X_mcRun3_2021_realistic_v2-v1/GEN-SIM
[3]: cmsDriver.py step3 -s RAW2DIGI,L1Reco,RECO,RECOSIM,EI,PAT,VALIDATION:@standardValidation+@miniAODValidation,DQM:@standardDQM+@ExtraHLT+@miniAODDQM --conditions auto:phase1_2021_realistic_0T --datatier GEN-SIM-RECO,MINIAODSIM,DQMIO -n 10 --eventcontent RECOSIM,MINIAODSIM,DQM --geometry DB:Extended --era Run3 --magField 0T --pileup Run3_Flat55To75_PoissonOOTPU --pileup_input das:/RelValMinBias_14TeV/CMSSW_12_0_0_pre4-120X_mcRun3_2021_realistic_v2-v1/GEN-SIM
[4]: cmsDriver.py step4 -s HARVESTING:@standardValidation+@standardDQM+@ExtraHLT+@miniAODValidation+@miniAODDQM --conditions auto:phase1_2021_realistic_0T --mc --geometry DB:Extended --scenario pp --filetype DQM --era Run3 -n 10 --magField 0T --pileup Run3_Flat55To75_PoissonOOTPU --pileup_input das:/RelValMinBias_14TeV/CMSSW_12_0_0_pre4-120X_mcRun3_2021_realistic_v2-v1/GEN-SIM
[5]: cmsDriver.py step5 -s NANO --conditions auto:phase1_2021_realistic --datatier NANOAODSIM -n 10 --eventcontent NANOEDMAODSIM --filein file:step3_inMINIAODSIM.root --geometry DB:Extended --era Run3
--------------------------------------------------------------------------------
11634.24 2021_0T+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA [1]: cmsDriver.py TTbar_14TeV_TuneCP5_cfi -s GEN,SIM -n 1 --conditions auto:phase1_2021_realistic_0T --beamspot Run3RoundOptics25ns13TeVLowSigmaZ --datatier GEN-SIM --eventcontent FEVTDEBUG --geometry DB:Extended --era Run3 --magField 0T --relval 9000,100
[2]: cmsDriver.py step2 -s DIGI:pdigi_valid,L1,DIGI2RAW,HLT:@relval2021 --conditions auto:phase1_2021_realistic_0T --datatier GEN-SIM-DIGI-RAW -n 1 --eventcontent FEVTDEBUGHLT --geometry DB:Extended --era Run3 --magField 0T
[3]: cmsDriver.py step3 -s RAW2DIGI,L1Reco,RECO,RECOSIM,EI,PAT,VALIDATION:@standardValidation+@miniAODValidation,DQM:@standardDQM+@ExtraHLT+@miniAODDQM --conditions auto:phase1_2021_realistic_0T --datatier GEN-SIM-RECO,MINIAODSIM,DQMIO -n 1 --eventcontent RECOSIM,MINIAODSIM,DQM --geometry DB:Extended --era Run3 --magField 0T
[4]: cmsDriver.py step4 -s HARVESTING:@standardValidation+@standardDQM+@ExtraHLT+@miniAODValidation+@miniAODDQM --conditions auto:phase1_2021_realistic_0T --mc --geometry DB:Extended --scenario pp --filetype DQM --era Run3 -n 1 --magField 0T
[5]: cmsDriver.py step5 -s ALCA:SiPixelCalSingleMuonLoose+SiPixelCalSingleMuonTight+TkAlMuonIsolated+TkAlMinBias+MuAlOverlaps+EcalESAlign+TkAlZMuMu+TkAlDiMuonAndVertex+HcalCalHBHEMuonFilter+TkAlUpsilonMuMu+TkAlJpsiMuMu+SiStripCalMinBias --conditions auto:phase1_2021_realistic_0T --datatier ALCARECO -n 1 --eventcontent ALCARECO --geometry DB:Extended --filein file:step3.root --era Run3 --magField 0T
|
apparently the IB tests recycle gen-sim; and this way they have 10 events here. |
so we know for sure that the CA is not designed to run correctly for 0T, e.g.:
but also: cmssw/RecoPixelVertexing/PixelTriplets/src/CACell.h Lines 219 to 222 in 2438e09
Some work was started before the beam test in case it was necessary to run reconstruction at 0T, but it was stopped as soon as it became clear we would have had field. |
The cmsDriver commands listed in the IB dashboard show explicit |
I have a vague recollection we introduced the |
Which, on a closer look, is exactly what @mmusich showed in #36492 (comment)
|
I think the remaining action is to make the |
assign pdmv |
New categories assigned: pdmv @bbilin,@wajidalikhan,@jordan-martins,@kskovpen you have been requested to review this Pull request/Issue and eventually sign? Thanks |
wasn't that a special setup for no-pixel-tracker configurations? |
or, wasn't it technically made so that no-pixel-tracker setup could work ... and chances are that in attempt to make the purpose more generic the name morphed to |
see please #36503. |
That's precisely what I had attempted here #33532, though we still had hopes to adapt the Run2 tracking as well. |
I see |
+1 |
Workflow 11834.24 step3 appears to time out consistently on cs8. In CMSSW_12_3_X_2021-12-13-2300 cs8_amd64_gcc10
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/cs8_amd64_gcc10/CMSSW_12_3_X_2021-12-13-2300/pyRelValMatrixLogs/run/11834.24_TTbar_14TeV+2021PU_0T+TTbar_14TeV_TuneCP5_GenSimINPUT+DigiPU+RecoNanoPU+HARVESTNanoPU/step3_TTbar_14TeV+2021PU_0T+TTbar_14TeV_TuneCP5_GenSimINPUT+DigiPU+RecoNanoPU+HARVESTNanoPU.log#/
Note that each event takes consistently 10-30 minutes to process.
The text was updated successfully, but these errors were encountered: