Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model specification is accessed before being validated #1952

Closed
1 task done
alexander-held opened this issue Aug 18, 2022 · 2 comments · Fixed by #1953
Closed
1 task done

Model specification is accessed before being validated #1952

alexander-held opened this issue Aug 18, 2022 · 2 comments · Fixed by #1953
Assignees
Labels
bug Something isn't working

Comments

@alexander-held
Copy link
Member

alexander-held commented Aug 18, 2022

Summary

The model specification is currently accessed before validation is done:

pyhf/src/pyhf/mixins.py

Lines 26 to 37 in a618640

for channel in channels:
self.channels.append(channel['name'])
self.channel_nbins[channel['name']] = len(channel['samples'][0]['data'])
for sample in channel['samples']:
self.samples.append(sample['name'])
for modifier_def in sample['modifiers']:
self.modifiers.append(
(
modifier_def['name'], # mod name
modifier_def['type'], # mod type
)
)

called via

pyhf/src/pyhf/workspace.py

Lines 303 to 310 in a618640

super().__init__(spec, channels=spec['channels'])
self.schema = config_kwargs.pop('schema', 'workspace.json')
self.version = config_kwargs.pop('version', spec.get('version', None))
# run jsonschema validation of input specification against the (provided) schema
if validate:
log.info(f"Validating spec against schema: {self.schema}")
schema.validate(self, self.schema, version=self.version)

(validation happens at the bottom).

If there is no strong reason for the order being like this, I think it would be useful to run the validation first to filter out problematic specifications.

I ran into this when building a multi-channel workspace from a single-channel workspace for debugging purposes, and ended up pasting the channel at the wrong level of indentation.

OS / Environment

n/a

Steps to Reproduce

import pyhf

spec = {
    "channels": [
        {
            "name": "SR",
            "samples_wrong_name": []
        }
    ]
}

pyhf.Workspace(spec)

File Upload (optional)

No response

Expected Results

pyhf.exceptions.InvalidSpecification or similar raised via the schema validation

Actual Results

Traceback (most recent call last):
  File "[...]/test.py", line 12, in <module>
    pyhf.Workspace(spec)
  File "[...]/pyhf/src/pyhf/workspace.py", line 303, in __init__
    super().__init__(spec, channels=spec['channels'])
  File "[...]/pyhf/src/pyhf/mixins.py", line 28, in __init__
    self.channel_nbins[channel['name']] = len(channel['samples'][0]['data'])
KeyError: 'samples'

pyhf Version

pyhf, version 0.7.0rc2.dev18

Code of Conduct

  • I agree to follow the Code of Conduct
@alexander-held alexander-held added bug Something isn't working needs-triage Needs a maintainer to categorize and assign labels Aug 18, 2022
@alexander-held
Copy link
Member Author

I tried out this change to run validation first, which seems to pass relevant tests (I see some unrelated failures in e.g. test_asymptotic_calculator_has_fitted_pars where the tolerance may need to be increased). Happy to submit this as a PR if you think this makes sense.

diff --git a/src/pyhf/workspace.py b/src/pyhf/workspace.py
index 6339f71c..070f3966 100644
--- a/src/pyhf/workspace.py
+++ b/src/pyhf/workspace.py
@@ -300,14 +300,15 @@ def __init__(self, spec, validate: bool = True, **config_kwargs):

         """
         spec = copy.deepcopy(spec)
-        super().__init__(spec, channels=spec['channels'])
         self.schema = config_kwargs.pop('schema', 'workspace.json')
         self.version = config_kwargs.pop('version', spec.get('version', None))

         # run jsonschema validation of input specification against the (provided) schema
         if validate:
             log.info(f"Validating spec against schema: {self.schema}")
-            schema.validate(self, self.schema, version=self.version)
+            schema.validate(spec, self.schema, version=self.version)
+
+        super().__init__(spec, channels=spec['channels'])

         self.measurement_names = []
         for measurement in self.get('measurements', []):

@matthewfeickert
Copy link
Member

matthewfeickert commented Aug 18, 2022

Happy to submit this as a PR if you think this makes sense.

Please do. 🚀 It would be great if you could also add a test that has the

pyhf.exceptions.InvalidSpecification or similar raised via the schema validation

that you mentioned.

Also thanks for this great Issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants