Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LRU raises #48

Merged
merged 28 commits into from
Feb 16, 2022
Merged

LRU raises #48

merged 28 commits into from
Feb 16, 2022

Conversation

ncclementi
Copy link
Member

@ncclementi ncclementi commented Nov 23, 2021

Based on #47
Closes dask/distributed#5364
This PR adds LRU to catch exceptions raised during a callback.

  • Test added/passes

This is ready for review.
cc: @jrbourbeau @crusaderky

@ncclementi ncclementi changed the title Slow raises and buffer exception LRU raises and buffer exception Nov 23, 2021
@ncclementi ncclementi changed the title LRU raises and buffer exception LRU raises Nov 29, 2021
@ncclementi ncclementi marked this pull request as ready for review November 29, 2021 22:19
assert L == [("x", 1)] # tried to evict and raised exception
assert lru.total_weight == 3
assert lru.weights == {"x": 1, "y": 1, "z": 1}

Copy link
Collaborator

@crusaderky crusaderky Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please add check (black box preferably) that y and z are safely stored in self.d (suggestion: you can do it in the test above by setting weight=lambda k, v: v)
  • Please add check (black box preferably) that the heap hasn't been corrupted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add check (black box preferably) that y and z are safely stored in self.d (suggestion: you can do it in the test above by setting weight=lambda k, v: v)

I added a separate test that uses wight since in LRU n will mean two different things depending on whether we provide weights or not.

n: int
        Number of elements to keep, or total weight if weight= is used

I'm not sure if I capture what you meant by "check that y and z are stored safely. " let me know if I misunderstood you.

Please add check (black box preferably) that the heap hasn't been corrupted

I wasn't quite sure what you meant by "corrupted" here, I'm checking now that the keys are in the heap with the right priority.

Please test use case where individual weight >= n

I added a case to check weight >= n

All changes are in cab1041

assert lru.total_weight == 3
assert lru.weights == {"x": 1, "y": 1, "z": 1}


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please test use case where individual weight >= n
  • Missing tests for Buffer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, I'll be adding buffer tests soon.

# fail raise we might want to avoid), not getting to the callbacks.
# If I do try/except and pass in slow_to_fast things are handled well, but not sure the
# pass is the right call here.
assert s2f == [
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are broken because there is one small thing I was not quite sure how to fix. It's explained on the commented code. I wonder what's the best approach for this,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer your question you should first have a legitimate use case where the buffer callback raises, and I cannot think of any. So I don't think we should try doing the right thing when we don't even know what "right" looks like.
There's also a wealth of potential issues when there is a chain of multiple callbacks and the first one fails, leaving the rest unexecuted. What does it mean? We don't know, it entirely depends on the callbacks. Again, we should not worry about it.

This whole PR is specifically to deal with Buffer.slow.__setitem__ raising - let's not get off track.

# fail raise we might want to avoid), not getting to the callbacks.
# If I do try/except and pass in slow_to_fast things are handled well, but not sure the
# pass is the right call here.
assert s2f == [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer your question you should first have a legitimate use case where the buffer callback raises, and I cannot think of any. So I don't think we should try doing the right thing when we don't even know what "right" looks like.
There's also a wealth of potential issues when there is a chain of multiple callbacks and the first one fails, leaving the rest unexecuted. What does it mean? We don't know, it entirely depends on the callbacks. Again, we should not worry about it.

This whole PR is specifically to deal with Buffer.slow.__setitem__ raising - let's not get off track.

@crusaderky
Copy link
Collaborator

@jrbourbeau please merge

try:
for cb in self.fast_to_slow_callbacks:
cb(key, value)
# LRU catches exception, raises and makes sure keys are not lost and located in fast.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @ncclementi and reviewing @crusaderky. @ncclementi can you add a short description of this behavior change to the Buffer and LRU docstrings? For both classes, a description along the lines of "If an exception occurs during X then Y happens." in the appropriate parameter description (e.g. on_evict for LRU) should be sufficient

Copy link
Member Author

@ncclementi ncclementi Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the documentation to the appropriate parameters: on_evict (LRU) and fast_to_slow_callback (Buffer).

# e.g. if a callback tried storing to disk and raised a disk full error
self.heap[k] = priority
self.d[k] = v
raise
Copy link
Member Author

@ncclementi ncclementi Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a problem with this raise. I was trying some things around where I ended up having on fast multiple keys as a result of disk_limit being reached, {'a': 400, 'd': 100, 'e': 500} . In slow after deleting a key I have only {'c': 200}, let's say the disk_limit = 500 . If I add a new key ('f': 100) that is small to go on fast, I'd expect 'd' that is on fast to be moved to slow. But this doesn't happen since the first key that it tries to move due to low priority is 'e' and it's too big that it raises and therefore it stops going over the rest of the dictionary, even though there is clearly room for 'd' on slow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. The LRU does not and should not know that the reason for the exception is that the data is too large. It would be expensive and complicated to try writing the other keys just in case they do not fail. Also, I don't think the use case is very interesting in real life in distributed.

@crusaderky
Copy link
Collaborator

What is outstanding on this PR?

@crusaderky crusaderky merged commit e1886a4 into dask:main Feb 16, 2022
@crusaderky
Copy link
Collaborator

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spill to constrained disk space
3 participants