Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow data-less workspaces #566

Open
lukasheinrich opened this issue Sep 17, 2019 · 4 comments
Open

allow data-less workspaces #566

lukasheinrich opened this issue Sep 17, 2019 · 4 comments
Labels
feat/enhancement New feature or request schema and spec JSON schema

Comments

@lukasheinrich
Copy link
Contributor

lukasheinrich commented Sep 17, 2019

Description

some workspaces do not include data (think blinded workspaces) and instead just describe the pdf. These workspaces are still useful as they allow us to do expected limits etc..

Currently the validation requires data

@lukasheinrich lukasheinrich added schema and spec JSON schema feat/enhancement New feature or request labels Sep 17, 2019
@kratsg
Copy link
Contributor

kratsg commented Sep 17, 2019

This is related to #514. I have ideas on #514 though :)

@alexander-held
Copy link
Member

Hi, adding some more detail to this, since I was writing it up to open an issue before finding this one.

Workspaces without histograms specified in the Data field in a channel cannot be read by pyhf xml2json. Example:

<Data HistoName="" InputFile="" HistoPath=""/>

Such workspaces are commonly used before unblinding an analysis. It is possible to fit the Asimov dataset with them, since all the required information for that is in the remaining histograms. The code tries to interpret the "" as a path to a histogram, and then fails with a IsADirectoryError via uproot.

Minimal example here: /afs/cern.ch/user/a/alheld/public/pyhf_asimov, run via pyhf xml2json minimal_example/RooStats/minimal_example.xml.

One of the following two options might be good:

  • For this specific case of a missing observation histogram, fill all entries with -1. This is easy to catch by the user, could be accompanied by a warning, and allows the fit model to be built through pyhf. This behavior should probably not exist for other histograms that define the model and might be missing, since that would alter the model and could lead to unexpected behavior.

  • Catch the issue and issue a more descriptive error message. The user would then need to build a different .xml input to be able to use pyhf. This requires manual intervention by the user, but could at least guide them to the right thing to do, as the error otherwise is not easy to understand.

@alexander-held
Copy link
Member

I just ran into this again and it took me a while to realize what the issue was and remember I had seen it before. The current error is very unintuitive.

What do you think about adding a validation against

pyhf/src/pyhf/readxml.py

Lines 278 to 280 in 2b168b7

inputfile = sample.attrib.get('InputFile', inputfile)
histopath = sample.attrib.get('HistoPath', histopath)
histoname = sample.attrib['HistoName']

not being ""? Could just raise a NotImplementedError here with the note that the user ran into this presumably because of lack of data (and optionally point to this issue). This at least makes it clear what is happening. I'm happy to prepare a small PR if you are ok with that intermediate solution (eventually that should be replaced by handling the case automatically without exception).

@matthewfeickert
Copy link
Member

I'm happy to prepare a small PR if you are ok with that intermediate solution (eventually that should be replaced by handling the case automatically without exception).

Sure that's welcome, though I think we should keep this Issue open until we properly fix this for the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/enhancement New feature or request schema and spec JSON schema
Projects
Status: To do
Development

No branches or pull requests

4 participants