Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCX: use nbytes instead of len #4621

Merged
merged 1 commit into from
Mar 24, 2021
Merged

UCX: use nbytes instead of len #4621

merged 1 commit into from
Mar 24, 2021

Conversation

madsbk
Copy link
Contributor

@madsbk madsbk commented Mar 23, 2021

Use nbytes() to check for empty buffer objects since nbytes() != len().
This fixes the first part of rapidsai/dask-cuda#549 where the send and recv got out of sync because they didn't agree on which empty frames to skip.

10:58:40   File "/opt/conda/envs/rapids/lib/python3.7/site-packages/ucp/core.py", line 628, in recv
10:58:40     ret = await comm.tag_recv(self._ep, buffer, nbytes, tag, name=log)
10:58:40 ucp.exceptions.UCXMsgTruncated: <[Recv #112] ep: 0x7f0e25cde0d8, tag: 0xfa85496d273cdec2, nbytes: 260, type: <class 'numpy.ndarray'>>: length mismatch: 16 (got) != 260 (expected)
  • Tests added / passed
  • Passes black distributed / flake8 distributed

cc. @jakirkham

@jakirkham
Copy link
Member

Thanks Mads! 😀

So I thought frames would be flattened by this point. Do you know why that is no longer the case? Where does this come up?

@madsbk
Copy link
Contributor Author

madsbk commented Mar 23, 2021

So I thought frames would be flattened by this point. Do you know why that is no longer the case? Where does this come up?

The explicit-comms tests in Dask-CUDA triggers the bug. I think it is because of empty NumPy arrays:

len(a = np.empty((2,0)) == 2

@quasiben
Copy link
Member

I believe the failure here test_steal_resource_restrictions is unrelated to this PR. Merging in. Thanks @madsbk for the work and @jakirkham for the review

@quasiben quasiben merged commit df8f892 into dask:main Mar 24, 2021
rapids-bot bot pushed a commit to rapidsai/dask-cuda that referenced this pull request Mar 24, 2021
Fixes #549 by converting received tuples to lists.

Depend on dask/distributed#4621, which fixes an unrelated bug also triggered by our explicit-comms tests.

Authors:
  - Mads R. B. Kristensen (@madsbk)

Approvers:
  - Peter Andreas Entschev (@pentschev)

URL: #555
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants