-
-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client.get_dataset to always create Futures attached to itself #3729
Merged
Merged
Changes from 17 commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
207c27b
Python 3.6+ syntax
crusaderky 02b1079
Code polish
crusaderky 026a602
Merge remote-tracking branch 'upstream/master' into get_dataset_async
crusaderky e07b9f8
Revert
crusaderky 5d2566c
Polish
crusaderky a5dc1be
Revert "Polish"
crusaderky f2b6e1f
Merge remote-tracking branch 'upstream/master' into get_dataset_async
crusaderky f86126a
tests
crusaderky 7ef5832
Merge branch 'master' into get_dataset_async
crusaderky 77a0d8a
revert
crusaderky de6b457
Merge remote-tracking branch 'upstream/master' into get_dataset_async
crusaderky 9d6c8c0
revert
crusaderky 6515c1b
Merge remote-tracking branch 'upstream/master' into get_dataset_async
crusaderky d8fcdc9
Merge branch 'master' into get_dataset_async
crusaderky 21522df
xfail
crusaderky 4b55b31
Better async functions
crusaderky fb8f777
Use contextvars to deserialize Future
crusaderky 804b537
Merge branch 'master' into get_dataset_async
crusaderky 2d594f8
Merge remote-tracking branch 'upstream/master' into get_dataset_async
crusaderky 042eddf
Redesign
crusaderky 621573d
Tweaks
crusaderky 77fa36b
Merge remote-tracking branch 'upstream/master' into get_dataset_async
crusaderky e70303a
docstrings
crusaderky f0f8c6e
Merge branch 'master' into get_dataset_async
crusaderky File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -213,6 +213,23 @@ def test_datasets_iter(client): | |
client.publish_dataset(**{str(key): key for key in keys}) | ||
for n, key in enumerate(client.datasets): | ||
assert key == str(n) | ||
with pytest.raises(TypeError): | ||
client.datasets.__aiter__() | ||
|
||
|
||
@gen_cluster(client=True) | ||
async def test_datasets_async(c, s, a, b): | ||
await c.publish_dataset(foo=1, bar=2) | ||
assert await c.datasets["foo"] == 1 | ||
assert {k async for k in c.datasets} == {"foo", "bar"} | ||
with pytest.raises(TypeError): | ||
c.datasets["baz"] = 3 | ||
with pytest.raises(TypeError): | ||
del c.datasets["foo"] | ||
with pytest.raises(TypeError): | ||
next(iter(c.datasets)) | ||
with pytest.raises(TypeError): | ||
len(c.datasets) | ||
|
||
|
||
@gen_cluster(client=True) | ||
|
@@ -229,3 +246,34 @@ async def test_pickle_safe(c, s, a, b): | |
|
||
with pytest.raises(TypeError): | ||
await c2.get_dataset("z") | ||
|
||
|
||
@gen_cluster(client=True) | ||
async def test_deserialize_client(c, s, a, b): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice test case. |
||
"""Test that the client attached to Futures returned by Client.get_dataset is always | ||
the instance of the client that invoked the method. | ||
Specifically: | ||
|
||
- when the client is defined by hostname, test that it is not accidentally | ||
reinitialised by IP; | ||
- when multiple clients are connected to the same scheduler, test that they don't | ||
interfere with each other. | ||
|
||
See: https://github.com/dask/distributed/issues/3227 | ||
""" | ||
future = await c.scatter("123") | ||
await c.publish_dataset(foo=future) | ||
future = await c.get_dataset("foo") | ||
assert future.client is c | ||
|
||
for addr in (s.address, "localhost:" + s.address.split(":")[-1]): | ||
async with Client(addr, asynchronous=True) as c2: | ||
future = await c.get_dataset("foo") | ||
assert future.client is c | ||
future = await c2.get_dataset("foo") | ||
assert future.client is c2 | ||
|
||
# Ensure cleanup | ||
from distributed.client import _deserialize_client | ||
|
||
assert _deserialize_client.get() is None |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
click >= 6.6 | ||
cloudpickle >= 0.2.2 | ||
contextvars;python_version<'3.7' | ||
dask >= 2.9.0 | ||
msgpack >= 0.6.0 | ||
psutil >= 5.0 | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to use contextvars here? We currently use a thread local for storing these kind of things, but that doesn't always play nicely with async. Is there a way we could refactor part of distributed to avoid passing around this implicit state?
In general I think
contextvars
can be quite useful where required, but want to ensure it's needed here before bringing it in. We also already have a fair bit of implicit state, I'm a bit hesitant to add more if we can avoid it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid using contextvars you need to explicitly propagate the client until you stop calling async functions and just call sync functions. In this case, this means that
client.Client._get_dataset
needs to passself
tocore.send_recv
, which in turn needs to pass it tocomm.tcp.TCP.read
(plus its variantscomm.inproc.InProc.read
and the 100% untestedcomm.ucx.UCX.read
), which in turn needs to pass it tocomm.utils.from_frames
, which can finally put a context manager (setting a thread-local variable) arounddistributed/distributed/comm/utils.py
Lines 60 to 62 in 8376f22
client.Future.__getstate__
and it should be really self-evident why I don't recommend doing it.
For this one you were using a global,
client._global_clients
.contextvars exactly fits the use case of thread locals when you use coroutines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. Should this logic replace how we access implicit clients everywhere? If so, perhaps we should rename the context var key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other cases where a method of Client loses
self
along the way and re-acquires it later from a global variable?Also note the very big fat caveat of the race conditions on Python 3.6. Not sure how much of the dask.distributed user base still uses Python 3.6 AND has more than one client running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, let me clarify.
It feels a bit weird to handle setting the contextvar in the
_get_dataset
method. It might be cleaner to have a parent context over all coroutines started by a client that includes a reference back to the client. As it stands this is baking in functionality toFuture
that is only used by one method, when we likely want this everywhere.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking to some sort of blanket decorator applied around all methods of Client? If so, it feels a bit overkill to me, as I do not know of another method that needs it.
You definitely want to reuse this same variable every time you have a
self
reference to a Client instance, invoke a method, lose the reference (because it would be too cumbersome to propagated by hand), and then you need to reacquire it deeper down the line before the end of the method. If you can point me to any other function that does this, I'll be happy to look into it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcrist ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm redesigning the change to make it more generic, hold on