-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to exit early from _init if a job manifest exists and is valid. #422
Conversation
Codecov Report
@@ Coverage Diff @@
## master #422 +/- ##
==========================================
+ Coverage 76.48% 76.51% +0.02%
==========================================
Files 45 45
Lines 7086 7086
==========================================
+ Hits 5420 5422 +2
+ Misses 1666 1664 -2
Continue to review full report at Codecov.
|
Co-authored-by: Carl Simon Adorf <[email protected]>
…c into feature/faster-job-init
except Exception: | ||
# Any exception means this method cannot exit early. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually wondering whether we should be specific here. Expected exceptions are OSError
, what else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_check_manifest
can raise JobsCorruptedError
, which is a subclass of RuntimeError
. There may be other exception classes that are raised in that function (it uses json
, hashlib
, file reading/decoding). I am hesitant to specify the exception classes because I don't know how to be sure that we're providing sufficient exception coverage and the penalty is high. Not catching an exception would lead to broken behavior, because the job wouldn't actually be initialized at the end of the initialization method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But any of those very unexpected errors are something we should actually raise. We could very easily test which error classes to catch by simply running this against "no file", "corrupted file", "empty file", "binary file", and "permissions wrong", and that should be about it IMO. Catching "all" errors here means to effectively change the behavior of this function, does it not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be fine with testing against those cases explicitly. However, I don't think the behavior of the function is changed by catching Exception
here. From the previous behavior, I would say that the "finishing condition" of the initialization is that _check_manifest
can complete without raising. I'm just testing that condition at the beginning as a way to exit early. Any errors in _check_manifest
would be raised in the original behavior, when _check_manifest
was called at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
Many methods/properties of
Job
invoke a call tojob.init()
, including the first access tojob.document
and all accesses tojob.stores
andjob.data
.This PR optimizes
job.init()
for jobs that are already initialized, by exiting early.Motivation and Context
This is a performance enhancement for many common data access operations in signac.
Performance results for the script below:
master
Types of Changes
Checklist:
If necessary: