-
-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use tempfile directory in cluster fixture #5825
Conversation
Unit Test Results 12 files ± 0 12 suites ±0 7h 17m 33s ⏱️ + 1h 6m 3s For more details on these failures, see this check. Results for commit cf62bba. ± Comparison against base commit 1da5199. ♻️ This comment has been updated with latest results. |
Interesting, this causes some permission errors on windows. I don't have time to look into this right now but I welcome if anybody else wants to pick this up |
6fa676d
to
42ca0f0
Compare
async def test_gen_test_pytest_fixture(tmp_path, c): | ||
async def test_gen_test_pytest_fixture(tmp_path): | ||
assert isinstance(tmp_path, pathlib.Path) | ||
assert isinstance(c, Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the c
/ client
fixture is creating the client in a synchronous context. this will actually block the event loop causing this to block and teardown for 2 x connect-timeout
scheduler.terminate() | ||
scheduler_q.close() | ||
scheduler_q._reader.close() | ||
scheduler_q._writer.close() | ||
|
||
for w in workers: | ||
w["proc"].terminate() | ||
w["queue"].close() | ||
w["queue"]._reader.close() | ||
w["queue"]._writer.close() | ||
|
||
scheduler.join(2) | ||
del scheduler | ||
for proc in [w["proc"] for w in workers]: | ||
proc.join(timeout=30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the above disconnects timeout, these processes are never closed and are leaking, not just in case there is an xfail
I am still receiving permission errors on windows when closing, even with existstack + tempdir. I assume this was never an issue since we suppressed the OSError on delete |
distributed/utils_test.py
Outdated
def _terminate_join(proc): | ||
proc.terminate() | ||
proc.join(timeout=30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does it typically take for a process to terminate? Is it worth splitting this up into "terminate all" then "join all" to speed things up?
All tests green 🎉 |
@graingert care to provide a final review? |
edit: ah it's just rmtree trying to resolve a PermissionError:
|
Adding the process.close seems to have done the trick, for now. Hard to tell since I'm hitting |
3806413
to
4e16eb6
Compare
I tried reproducing the permission error on a windows machine but it appears to be again one of these situations where it only triggers once in a few hundred/thousand invocations or possibly only in combination with other test runs. I don't think tracking this down is worth it right now and I would like to get this in since I think this includes several valuable fixes. I went on and ignored the PermissionError now. If nothing else pops up, I'll go ahead and merge. |
@@ -766,12 +783,6 @@ def cluster( | |||
else: | |||
client.close() | |||
|
|||
start = time() | |||
while any(proc.is_alive() for proc in ws): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
procs are closed, i.e. is_alive
is not possible anymore. In fact, it even raises an exception since you can't even interact with the proc object anymore after it is being closed
Diff is huge but in the end I'm using a tempfile directory instead of letting the worker create the dir. This is mostly cosmetic. I see a lot of directories being created in the repo when running tests.
Edit: This escalated in a larger refactoring of this function to ensure all processes are properly cleaned up, etc.