-
-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework some tests related to gather_dep #6472
Conversation
for name, color in zip(lists["name"], lists["color"]): | ||
if name == "transfer": | ||
assert color == "red" | ||
assert (name == "transfer-sum") == (color == "red") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tightened tested conditions
assert ev[0] == "request-dep" | ||
assert len(ev[2]) == 5 | ||
for ev in story[20:]: | ||
assert ev[0] == "receive-dep" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tighten tested condition + clarifications
distributed/tests/test_worker.py
Outdated
@gen_cluster(client=True, Worker=Nanny) | ||
async def test_acquire_replicas_already_in_flight(c, s, *nannies): | ||
@gen_cluster(client=True, nthreads=[("", 1)]) | ||
async def test_acquire_replicas_already_in_flight(c, s, a): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test is now deterministic and much faster
distributed/tests/test_worker.py
Outdated
@gen_cluster(client=True, nthreads=[("", 1)]) | ||
async def test_gather_dep_cancelled_rescheduled(c, s, a): | ||
"""A task transitions flight->cancelled->fetch->flight, all while gather_dep is | ||
waiting for the data of the initial flight. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test can now be drastically simplified after #6371.
I don't think we should simplify tests just because we know that internals changed.
Remove assumption that both dependencies of a task will be gathered in a single fetch
Where do we assume this in the previous test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should simplify tests just because we know that internals changed.
The previous test was so complicated because it had to have two checkpoints:
- after you enter gather_dep, but before you run the preamble
- while you're in the comms
I think that being able to say "it doesn't really matter in which point of gather_dep you pinch it; nothing fancy will happen before comms" is a very reasonable thing to say?
Where do we assume this in the previous test?
fut4 = c.submit(sum, fut1, fut2, workers=[b.address], key="f4")
This sends a compute-task event to b, which in turn sends f1 and f2 into fetch at the same time.
As they're dependencies of the same task, they have the same exact priority.
After #6462, there are two separate fetches. You'll get that f1 is fetched first 50% of the times, at random. The two keys are most likely go through a set at some point during the process, and the ordering of sets changes every time you restart the python interpreter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that being able to say "it doesn't really matter in which point of gather_dep you pinch it; nothing fancy will happen before comms" is a very reasonable thing to say?
Will this be true forever? How do we ensure that this stays true? This test asks the question
"Is it harmful to cancel a task after a fetch is scheduled (gather-dep
) but before we're actually stepping in this coroutine".
The only way to answer this question with certainty is to have this test or ensure there is not a single await before the get_data message is sent which is hard/impossible.
While this is a seemingly artificial question to ask, this was causing a deadlock in the past #5525 and I don't think the pre-filtering optimization was entirely unreasonable.
Test are also there to protect us from reintroducing regressions and this test is written at a high enough abstraction level (we only rely on task state names (e.g. flight, cancelled) and the fact that there are the gather_dep/get_data methods) that I have confidence it won't bother us a lot. The test is not known to be flaky, it runs very fast (0.1s) and still works on your proposed change of #6462 (at least for me locally)
I don't see a reason why we should change anything about this test
distributed/tests/test_worker.py
Outdated
"""A task transitions from flight to cancelled while gather_dep is waiting for the | ||
data. | ||
|
||
See also test_gather_dep_cancelled_rescheduled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove assumption that both dependencies of a task will be gathered in a single fetch (this will no longer be the case in #6462).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we assume this in the previous test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original test still passes for me w/ your changes in #6462
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not.
50% of the times you get:
> assert b.tasks[fut2.key].state == "flight"
E AssertionError: assert 'fetch' == 'flight'
Unit Test Results 15 files + 12 15 suites +12 6h 16m 7s ⏱️ + 5h 28m 21s For more details on these failures and errors, see this check. Results for commit 8d5d50e. ± Comparison against base commit 69b798d. ♻️ This comment has been updated with latest results. |
d0d03ee
to
e59a138
Compare
e59a138
to
efb2dd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like us to use stories in tests only if necessary and would always prefer to use something more high level.
Most of these tests were written in a way that stories were not necessary before and I do not see added value in having a redundant, low level assert. I left comments everywhere so we can discuss individually if this is necessary
distributed/tests/test_worker.py
Outdated
""" | ||
a.total_out_connections = 2 | ||
futures = await c.scatter( | ||
{f"x{i}": i for i in range(100)}, | ||
workers=[w.address for w in workers], | ||
) | ||
assert all(w.data for w in workers) | ||
assert all(len(w.data) == 5 for w in workers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I feel this is too granular for this test. Does this test require the distribution to be homogeneous? I think we should not test a "decide_worker" logic in this test.
- How is
5
even a correct number when we're scattering 100 tasks to 21 workers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is too granular for this test.
Ok, relaxing it.
How is 5 even a correct number when we're scattering 100 tasks to 21 workers?
No, we're scattering 100 tasks to 20 workers.
See function signature: c, s, a, *workers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed workers to snd_workers for clarification
distributed/tests/test_worker.py
Outdated
while a.log[-1][:5] != ("x", "flight", "fetch", "flight", {}): | ||
await asyncio.sleep(0.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there an event we can listen to? I consider events much more robust than the transition logs and much easier to read.
I would even prefer having a plain sleep in here than listening to the transition log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to trigger the event directly on the worker so that no wait is involved
distributed/tests/test_worker.py
Outdated
@gen_cluster(client=True, nthreads=[("", 1)]) | ||
async def test_gather_dep_cancelled_rescheduled(c, s, a): | ||
"""A task transitions flight->cancelled->fetch->flight, all while gather_dep is | ||
waiting for the data of the initial flight. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test can now be drastically simplified after #6371.
I don't think we should simplify tests just because we know that internals changed.
Remove assumption that both dependencies of a task will be gathered in a single fetch
Where do we assume this in the previous test?
distributed/tests/test_worker.py
Outdated
assert_story( | ||
a.story(fut1.key), | ||
[ | ||
(fut1.key, "fetch", "flight", "flight", {}), | ||
(fut1.key, "flight", "released", "cancelled", {}), | ||
(fut1.key, "cancelled", "fetch", "flight", {}), | ||
(fut1.key, "flight", "memory", "memory", {"f2": "ready"}), | ||
], | ||
) | ||
# Test that the data transfer only happens once | ||
assert_story( | ||
a.story("request-dep"), | ||
[ | ||
("request-dep", b.address, {fut1.key}), | ||
], | ||
strict=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these stories add a lot of value to the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you propose? I think it's important to test that we sent out only one transfer request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is outgoing_transfer_log
and incoming_transfer_log
or even more coarse, there are counters for both.
e.g.
assert_story( | |
a.story(fut1.key), | |
[ | |
(fut1.key, "fetch", "flight", "flight", {}), | |
(fut1.key, "flight", "released", "cancelled", {}), | |
(fut1.key, "cancelled", "fetch", "flight", {}), | |
(fut1.key, "flight", "memory", "memory", {"f2": "ready"}), | |
], | |
) | |
# Test that the data transfer only happens once | |
assert_story( | |
a.story("request-dep"), | |
[ | |
("request-dep", b.address, {fut1.key}), | |
], | |
strict=True, | |
) | |
assert a.incoming_count == 1 |
distributed/tests/test_worker.py
Outdated
assert_story( | ||
b.story(fut1.key), | ||
[ | ||
(fut1.key, "flight", "released", "cancelled", {}), | ||
(fut1.key, "cancelled", "memory", "released", {fut1.key: "forgotten"}), | ||
(fut1.key, "released", "forgotten", "forgotten", {}), | ||
], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this story is redundant. We're asserting all of this above already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
distributed/tests/test_worker.py
Outdated
"""A task transitions from flight to cancelled while gather_dep is waiting for the | ||
data. | ||
|
||
See also test_gather_dep_cancelled_rescheduled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we assume this in the previous test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling to reproduce issues that are supposedly introduced with #6462
If there are any issues that connect to that PR specifically, I suggest to discuss it over there. I appreciate the effort of breaking up larger changes but if the changes are directly connected, I'm actually having a harder time reviewing if they are split up
distributed/tests/test_worker.py
Outdated
"""A task transitions from flight to cancelled while gather_dep is waiting for the | ||
data. | ||
|
||
See also test_gather_dep_cancelled_rescheduled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original test still passes for me w/ your changes in #6462
distributed/tests/test_worker.py
Outdated
@gen_cluster(client=True, nthreads=[("", 1)]) | ||
async def test_gather_dep_cancelled_rescheduled(c, s, a): | ||
"""A task transitions flight->cancelled->fetch->flight, all while gather_dep is | ||
waiting for the data of the initial flight. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that being able to say "it doesn't really matter in which point of gather_dep you pinch it; nothing fancy will happen before comms" is a very reasonable thing to say?
Will this be true forever? How do we ensure that this stays true? This test asks the question
"Is it harmful to cancel a task after a fetch is scheduled (gather-dep
) but before we're actually stepping in this coroutine".
The only way to answer this question with certainty is to have this test or ensure there is not a single await before the get_data message is sent which is hard/impossible.
While this is a seemingly artificial question to ask, this was causing a deadlock in the past #5525 and I don't think the pre-filtering optimization was entirely unreasonable.
Test are also there to protect us from reintroducing regressions and this test is written at a high enough abstraction level (we only rely on task state names (e.g. flight, cancelled) and the fact that there are the gather_dep/get_data methods) that I have confidence it won't bother us a lot. The test is not known to be flaky, it runs very fast (0.1s) and still works on your proposed change of #6462 (at least for me locally)
I don't see a reason why we should change anything about this test
Salvaged PR. This is again ready for review and merge. |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ±0 15 suites ±0 6h 28m 20s ⏱️ - 21m 18s For more details on these failures, see this check. Results for commit 51a3ba9. ± Comparison against base commit 344868a. ♻️ This comment has been updated with latest results. |
Cosmetic tweaks to a few tests.
This has been salvaged from #6462 following controversy.