Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT caching updates #886

Merged
merged 10 commits into from
Dec 15, 2020
Merged

JIT caching updates #886

merged 10 commits into from
Dec 15, 2020

Conversation

sk1p
Copy link
Member

@sk1p sk1p commented Oct 13, 2020

Fixes #798 once numba properly caches functions with w/ closures over other functions. Independent from this, we can decide if we want to add a work-around to make these functions cacheable.

Contributor Checklist:

@sk1p sk1p added this to the 0.6 milestone Oct 13, 2020
@sk1p
Copy link
Member Author

sk1p commented Oct 15, 2020

For some reason, with the cache warming enabled, the tests seem to fail more consistently. Great. I'll postpone merging this until I know why.

@sk1p sk1p added the WIP label Oct 15, 2020
@sk1p sk1p force-pushed the jit-caching-warmup branch 2 times, most recently from eecc69b to af96616 Compare November 30, 2020 15:03
@sk1p
Copy link
Member Author

sk1p commented Nov 30, 2020

For some reason, with the cache warming enabled, the tests seem to fail more consistently. Great. I'll postpone merging this until I know why.

This should now behave better, as we should only have a single jit compilation when re-using the dask worker and scheduler.

@codecov
Copy link

codecov bot commented Nov 30, 2020

Codecov Report

Merging #886 (b03e2fb) into master (d64013b) will increase coverage by 0.16%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #886      +/-   ##
==========================================
+ Coverage   68.78%   68.94%   +0.16%     
==========================================
  Files         267      267              
  Lines       11711    11749      +38     
  Branches     1592     1597       +5     
==========================================
+ Hits         8055     8100      +45     
+ Misses       3355     3351       -4     
+ Partials      301      298       -3     
Impacted Files Coverage Δ
src/libertem/corrections/detector.py 92.15% <100.00%> (ø)
src/libertem/executor/base.py 93.13% <100.00%> (+2.41%) ⬆️
src/libertem/executor/dask.py 76.58% <100.00%> (+0.66%) ⬆️
src/libertem/executor/inline.py 94.87% <100.00%> (+2.97%) ⬆️
src/libertem/io/dataset/base/backend.py 96.17% <100.00%> (ø)
src/libertem/io/dataset/base/decode.py 80.00% <100.00%> (ø)
src/libertem/io/dataset/base/tiling.py 90.22% <100.00%> (ø)
src/libertem/udf/holography.py 88.88% <100.00%> (+0.42%) ⬆️
src/libertem/udf/stddev.py 97.95% <100.00%> (ø)
src/libertem/web/dataset.py 80.18% <100.00%> (+4.89%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d64013b...b03e2fb. Read the comment docs.

@sk1p
Copy link
Member Author

sk1p commented Nov 30, 2020

/azp run libertem.libertem-data

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sk1p
Copy link
Member Author

sk1p commented Nov 30, 2020

Tests are still not stable, but this time I can reproduce the issue with Python 3.6 locally.

@sk1p
Copy link
Member Author

sk1p commented Dec 12, 2020

Tracked down to tornadoweb/tornado#2034 dask/distributed#4356, proposed fix at tornadoweb/tornado#2963

@uellue
Copy link
Member

uellue commented Dec 14, 2020

Wow, nice catch! 🚀

@uellue
Copy link
Member

uellue commented Dec 14, 2020

So, do we have to wait for upstream or can we circumvent this in LiberTEM?

@sk1p
Copy link
Member Author

sk1p commented Dec 14, 2020

So, do we have to wait for upstream or can we circumvent this in LiberTEM?

This one is hard to circumvent, as the problem is caused by a centrally used utility function. I think it could be possible to make the failure less likely, by just taking a different code path. I'll need to try a few things, and I'll come back to this PR.

@sk1p sk1p force-pushed the jit-caching-warmup branch from af96616 to 1e14208 Compare December 14, 2020 13:23
@sk1p sk1p removed the WIP label Dec 14, 2020
@sk1p
Copy link
Member Author

sk1p commented Dec 14, 2020

Looks like that worked, both Mac and Win Py36 build finished.

@sk1p
Copy link
Member Author

sk1p commented Dec 14, 2020

/azp run libertem.libertem-data

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@uellue
Copy link
Member

uellue commented Dec 15, 2020

Nice to see that it goes around that issue! 🎉

Is the low test coverage an artifact from not collecting coverage from Dask, or could we still add test cases that improve coverage?

sk1p added 8 commits December 15, 2020 10:43
This also adds a new `run_each_worker` method on `JobExecutor`
Also return the results of each function run
It's possible that this is less likely to trigger the problem with
distributed.utils.All / tornado.gen.WaitItertor
@sk1p sk1p force-pushed the jit-caching-warmup branch from 796557b to b03e2fb Compare December 15, 2020 09:51
@sk1p
Copy link
Member Author

sk1p commented Dec 15, 2020

/azp run libertem.libertem-data

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sk1p
Copy link
Member Author

sk1p commented Dec 15, 2020

Is the low test coverage an artifact from not collecting coverage from Dask, or could we still add test cases that improve coverage?

It was, but now I've added a test case using the InlineJobExecutor together with the web API, which allows us to collect coverage for this case. For some ideas on a full fix, see #545. This should now be ready to merge.

@uellue
Copy link
Member

uellue commented Dec 15, 2020

Awesome, merging!

@uellue uellue merged commit ae88a6c into LiberTEM:master Dec 15, 2020
@sk1p sk1p deleted the jit-caching-warmup branch January 25, 2021 19:11
@sk1p
Copy link
Member Author

sk1p commented Jan 25, 2021

Tracked down to tornadoweb/tornado#2034 dask/distributed#4356, proposed fix at tornadoweb/tornado#2963

Btw. this was recently merged, so hopefully will be fixed for good on the next release of tornado.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove Numba cache hack (was "Ensure jitted functions can be cached" before)
2 participants