-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precomputing Lumi Mask For Event Based Data RelVals #138
Conversation
nevents limit is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding 3rd party packages being available, we need to be aware of the Python version available in the cms-sw
release as el7
and el8
containers do not allow the user to install external dependencies.
For me we can directly allow this to run only if the ticket is for |
Instead provide two new functions `list_certification_files` and `get_certification_file` for listing all the files related to a certification type and get a specific golden JSON file. Also, sort the imports using `isort`.
Remove type hints as old Python version do not support them
Testing with this ticket CMSSW_10_6_20__TEST_LUMI_MASK_fullsim_PU_2017_UL-00001 that uses an old release it seems we have a regression problem as the assumption made by the [2024-12-12 01:53:18,382][ERROR] STDERR: WARNING: In non-interactive mode release checks e.g. deprecated releases, production architectures are disabled.
CMSSW_10_6_20__TEST_LUMI_MASK_fullsim_PU_2017_UL-00001/singularity-script-00489c3c6c0f0e58372a2dbb2fb628be.sh: line 19: CMSSW_10_6_20__TEST_LUMI_MASK_fullsim_PU_/cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/py2-requests/2.21.0-pafccj2/lib/python2.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.25.2) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
Traceback (most recent call last):
File "run_the_matrix_pdmv.py", line 318, in <module>
main()
File "run_the_matrix_pdmv.py", line 305, in main
wmsplit))
File "run_the_matrix_pdmv.py", line 189, in make_relval_step
lumisections = get_lumi_ranges_from_dict(step_input)
File "run_the_matrix_pdmv.py", line 164, in get_lumi_ranges_from_dict
golden = get_golden_json(step_input.dataSet)
File "CMSSW_10_6_20__TEST_LUMI_MASK_fullsim_PU_2017_UL-00001/dqm.py", line 119, in get_golden_json
cert_type = get_cert_type(dataset)
File "CMSSW_10_6_20__TEST_LUMI_MASK_fullsim_PU_2017_UL-00001/dqm.py", line 76, in get_cert_type
year = dataset.split("Run")[1][2:4] # from 20XX to XX
IndexError: list index out of range @AdrianoDee could you check this? |
I think this is due to the fact that the wfs there are MC wfs requiring an input (recycled). So this wrongfully triggers all the chain, that instead should be called eventuallly only on data. I'm looking at it to understand how to avoid this. |
HI @ggonzr I should have fixed the issue with MC workflows with inputs. Note that the specific ticket in /afs/cern.ch/user/p/pdmvserv/relval_submission/CMSSW_10_6_20__TEST_LUMI_MASK_fullsim_PU_2017_UL-00001/singularity-script-aaf0e3c85defa1f3a678bc558c002136.sh: line 19: cd: relval_submission/CMSSW_10_6_20__TEST_LUMI_MASK_fullsim_PU_2017_UL-00001:
No such filTraceback (most recent call last): File "run_the_matrix_pdmv.py", line 300, in main() File "run_the_matrix_pdmv.py", line 295, in main with open(opt.output_file, 'w', encoding='utf-8') as workflows_file:
TypeError: 'encoding' is an invalid keyword argument for this function also in the prod instance. I think this is due to a Python2 incompatibility. The same ticket in 12_6_0 works. I've added the docs (let me know if they're fine for you). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some typos to fix.
List the packages and versions used for the development of remote modules located in `utils/`
Format some modules, in special the new ones.
In #131 we've introduced the possibility to run data wfs for which the number of events is used to skim the input files. This technically works but create some problems when submitting these wfs.
Basically, even if the list of files on which we want to run is parsed correctly, we would end up anyway staging the entire RAW dataset since the job
SplittingAlgo
would be still flagged asLumiBased
. And currently the job dict for an event based looks like this.I've done some investigations and there's the possibility to use a
FileBased
splitting but this would not solve the problem since it would only change the splitting but it would anyway trigger the staging of the whole dataset. The only way to ask for a partial dataset is either specifying a block (not useful) either through a lumi mask.Thus in this PR I propose to precompute the lumi mask corresponding to the number of events requested specified in the data wf so that the job gets it in tis dict and we stage only the needed files. This is done by implementing a lightweight version of
das-up-to-nevents.py
in the RelVal machine, to be called (when needed) byrun-the-matrix-pdmv.py
when the job is created.The resulting dict for an example job is what one would expect: