-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve models generation #96
Improve models generation #96
Conversation
- refactored I/O from protopipe.mva.io - add Random Forest energy regressor - explicit all class options from scikit-learn - better organized information - smaller fixes
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov Report
@@ Coverage Diff @@
## master #96 +/- ##
=======================================
Coverage 48.92% 48.93%
=======================================
Files 22 23 +1
Lines 2001 2058 +57
=======================================
+ Hits 979 1007 +28
- Misses 1022 1051 +29
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Had just a few small inline changes to suggest.
A general question: is there a way to include another YAML file in a YAML file, via some pre-processor perhaps? (jinja2 maybe?) Right now if you make any changes, you have a lot of YAML files to update, and many of them have a lot of the same text in them, with only the regressor changing. So having a file for parameters and another with the regressor config would simplify that a lot. That is not necessary for this PR, but is something to think about as a refactoring.
protopipe/mva/utils.py
Outdated
except: | ||
pass | ||
except KeyError as e: | ||
print(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this really silently fail (well partially silently, as it just prints the error and continues)? If so, maybe add a comment explaining why the error is caught but nothing is done with it. Otherwise, use a warning or raise the exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it a second time I decided that it was not done well so I rewrote it.
basically now:
- if label is None, it means that we are doing an energy regressor, so all this code block doesn't happen
- if it's not None, label is always added to the dataframe, but the 2 energy keys are checked for existence since in our reference analysis we need energy as 1 of the features
unfortunately right now this is needed to make DL2 work here, so I raise an error if the model is a classifier and those keys are not selected for usage (or I could add them always?)
# This is needed because our reference analysis uses energy as | ||
# feature for classification | ||
# We should propably support a more elastic choice in the future. | ||
if not all(i in derived_features for i in ["log10_reco_energy", "log10_reco_energy_tel"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also write this as
if not {"log10_reco_energy", "log10_reco_energy_tel"}.issubset(set(derived_features)):
Do you really need this to fail if missing? Isn't it just a choice in the config file whether or not to use energy in the feature list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this is what I was referring to above: right now the DL2 script expects these 2 parameters as features because this is how our reference analysis works, but I plan to make it more flexible (also the refactored version should)
I will leave this for later and for now just rely on the error message
Requirements
Summary of expected modifications
protopipe.scripts.build_models
toprotopipe.mva.io