Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spill to constrained disk space #5364

Closed
crusaderky opened this issue Sep 28, 2021 · 5 comments · Fixed by #5805, #5543 or dask/zict#48
Closed

Spill to constrained disk space #5364

crusaderky opened this issue Sep 28, 2021 · 5 comments · Fixed by #5805, #5543 or dask/zict#48
Assignees
Labels
enhancement Improve existing functionality or make things work better

Comments

@crusaderky
Copy link
Collaborator

crusaderky commented Sep 28, 2021

Use case

This has been raised offline by a power user.
Their workers have limited disk space - frequently less than the amount of RAM. At the moment, the user has completely disabled spilling as the spill file will occupy all available space and when that happens OSErrors will start being raised and data will be lost.

Proposed behaviour

  • If spill-to-disk limit is hit, stop spilling to disk and keep data in memory.
  • Buildup of memory will then pause worker

Proposed design

  1. Add a line to the dask config to put an upper limit to the size of the spill file.
  2. Keep track of the current size of the spill files on disk. This may not be exactly the same as the size of the keys due to discrepancies between sizeof() and pickle output. Note that this measure can be done without I/O by intercepting the calls to zict.File.__setitem__.
  3. Ahead of spilling, add the sizeof() of the key to be spilled to the current size of the spill files on disk. If the maximum size would be exceeded, log a warning and don't spill. If memory pressure keeps building up, this will in turn cause the worker to eventually reach the pause threshold.
@crusaderky crusaderky added enhancement Improve existing functionality or make things work better good second issue Clearly described, educational, but less trivial than "good first issue". labels Sep 28, 2021
@jrbourbeau
Copy link
Member

cc @madsbk who has thought a lot about spilling

@jrbourbeau jrbourbeau removed the good second issue Clearly described, educational, but less trivial than "good first issue". label Nov 11, 2021
@gjoseph92

This comment has been minimized.

@crusaderky
Copy link
Collaborator Author

If the buffer is currently using 2GiB of memory, and you change the limit such that the spill-to-disk limit is now 1.5GiB

I don't see a benefit in letting the user alter the spill-to-disk limit after the worker has started?

@gjoseph92
Copy link
Collaborator

Sorry, I meant to put that comment on #5367. Moving to there.

@ncclementi
Copy link
Member

ncclementi commented Nov 29, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment