Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to exit early from _init if a job manifest exists and is valid. #422

Merged
merged 6 commits into from
Dec 1, 2020

Conversation

bdice
Copy link
Member

@bdice bdice commented Nov 24, 2020

Description

Many methods/properties of Job invoke a call to job.init(), including the first access to job.document and all accesses to job.stores and job.data.

This PR optimizes job.init() for jobs that are already initialized, by exiting early.

Motivation and Context

This is a performance enhancement for many common data access operations in signac.

Performance results for the script below:

Test master This branch (b4917d8) Speedup
First init 3.426s 3.546s 0.966x (slower)
Already init'd 2.623s 1.774s 1.479x (faster)
import signac

project = signac.init_project('test')

for i in range(20000):
    project.open_job({'a': i, 'b': i*2, 'c': i*3}).init()

Types of Changes

  • New feature

Checklist:

If necessary:

  • I have updated the changelog and added all related issue and pull request numbers for future reference (if applicable). See example below.

@bdice bdice requested review from a team as code owners November 24, 2020 06:14
@bdice bdice requested review from vyasr and lyrivera and removed request for a team November 24, 2020 06:14
@codecov
Copy link

codecov bot commented Nov 24, 2020

Codecov Report

Merging #422 (f20ba30) into master (9db354f) will increase coverage by 0.02%.
The diff coverage is 50.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #422      +/-   ##
==========================================
+ Coverage   76.48%   76.51%   +0.02%     
==========================================
  Files          45       45              
  Lines        7086     7086              
==========================================
+ Hits         5420     5422       +2     
+ Misses       1666     1664       -2     
Impacted Files Coverage Δ
signac/contrib/job.py 92.80% <50.00%> (+0.79%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9db354f...f20ba30. Read the comment docs.

signac/contrib/job.py Outdated Show resolved Hide resolved
@csadorf csadorf requested review from csadorf and removed request for vyasr November 29, 2020 08:41
changelog.txt Outdated Show resolved Hide resolved
signac/contrib/job.py Outdated Show resolved Hide resolved
signac/contrib/job.py Outdated Show resolved Hide resolved
@bdice bdice requested a review from csadorf November 29, 2020 09:09
Comment on lines +455 to +456
except Exception:
# Any exception means this method cannot exit early.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually wondering whether we should be specific here. Expected exceptions are OSError, what else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_check_manifest can raise JobsCorruptedError, which is a subclass of RuntimeError. There may be other exception classes that are raised in that function (it uses json, hashlib, file reading/decoding). I am hesitant to specify the exception classes because I don't know how to be sure that we're providing sufficient exception coverage and the penalty is high. Not catching an exception would lead to broken behavior, because the job wouldn't actually be initialized at the end of the initialization method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But any of those very unexpected errors are something we should actually raise. We could very easily test which error classes to catch by simply running this against "no file", "corrupted file", "empty file", "binary file", and "permissions wrong", and that should be about it IMO. Catching "all" errors here means to effectively change the behavior of this function, does it not?

Copy link
Member Author

@bdice bdice Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be fine with testing against those cases explicitly. However, I don't think the behavior of the function is changed by catching Exception here. From the previous behavior, I would say that the "finishing condition" of the initialization is that _check_manifest can complete without raising. I'm just testing that condition at the beginning as a way to exit early. Any errors in _check_manifest would be raised in the original behavior, when _check_manifest was called at the end.

@bdice bdice requested a review from csadorf November 30, 2020 16:05
Copy link

@lyrivera lyrivera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@bdice bdice merged commit 9d23737 into master Dec 1, 2020
@bdice bdice deleted the feature/faster-job-init branch December 1, 2020 06:24
@bdice bdice added this to the v1.5.1 milestone Dec 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants