Fix doc style #575

jakirkham · 2020-08-11T22:56:29Z

Fixes a linting error observed in PR ( #574 ).

This is needed to fix a lint error in the docs.

jakirkham · 2020-08-11T23:33:40Z

Seeing the test failure from Distributed below (and some variants of it) on CI. This is a consequence of the change in PR ( rapidsai/rmm#477 ), which causes RMM's DeviceBuffer to have strides as None in the header when serialized. Simply always setting RMM's DeviceBuffer's strides in the header as is done in PR ( dask/distributed#4039 ) should fix the issue.

_______________________ test_serialize_numba_from_rmm[0] _______________________

size = 0

    @pytest.mark.parametrize("size", [0, 3, 10])
    def test_serialize_numba_from_rmm(size):
        np = pytest.importorskip("numpy")
        rmm = pytest.importorskip("rmm")
    
        if not cuda.is_available():
            pytest.skip("CUDA is not available")
    
        x_np = np.arange(size, dtype="u1")
    
        x_np_desc = x_np.__array_interface__
        (x_np_ptr, _) = x_np_desc["data"]
        (x_np_size,) = x_np_desc["shape"]
        x = rmm.DeviceBuffer(ptr=x_np_ptr, size=x_np_size)
    
        header, frames = serialize(x, serializers=("cuda", "dask", "pickle"))
        header["type-serialized"] = pickle.dumps(cuda.devicearray.DeviceNDArray)
    
>       y = deserialize(header, frames, deserializers=("cuda", "dask", "pickle", "error"))

/opt/conda/envs/gdf/lib/python3.7/site-packages/distributed/protocol/tests/test_numba.py:53: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/serialize.py:335: in deserialize
    return loads(header, frames)
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/cuda.py:28: in cuda_loads
    return loads(header, frames)
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/protocol/numba.py:46: in cuda_deserialize_numba_ndarray
    gpu_data=numba.cuda.as_cuda_array(frame).gpu_data,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <numba.cuda.cudadrv.devicearray.DeviceNDArray object at 0x7fc3376ace50>
shape = (0,), strides = None, dtype = dtype('uint8'), stream = 0
writeback = None
gpu_data = <numba.cuda.cudadrv.driver.MemoryPointer object at 0x7fc3376acd90>

    def __init__(self, shape, strides, dtype, stream=0, writeback=None,
                 gpu_data=None):
        """
        Args
        ----
    
        shape
            array shape.
        strides
            array strides.
        dtype
            data type as np.dtype coercible object.
        stream
            cuda stream.
        writeback
            Deprecated.
        gpu_data
            user provided device memory for the ndarray data buffer
        """
        if isinstance(shape, int):
            shape = (shape,)
        if isinstance(strides, int):
            strides = (strides,)
        dtype = np.dtype(dtype)
        self.ndim = len(shape)
>       if len(strides) != self.ndim:
E       TypeError: object of type 'NoneType' has no len()

/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/cudadrv/devicearray.py:90: TypeError

jakirkham · 2020-08-12T00:01:57Z

Interestingly CuPy does not run into this error, this appears to be due to the fact that it will just interpret strides is None as C-contiguous (similar to CuPy). This seems like desirable behavior to have in Numba as well. Submitted PR ( numba/numba#6122 ) to make that change in Numba.

jakirkham · 2020-08-12T00:26:02Z

rerun tests

raydouglass

You'll also need to update https://github.com/rapidsai/ucx-py/blob/branch-0.15/ci/release/update-version.sh#L49

jakirkham · 2020-08-12T01:38:59Z

Why is that? We still have version defined. It's just extracted from release instead.

Edit: Oh, do you mean we should delete those lines?

raydouglass · 2020-08-12T01:40:54Z

That script is run when a branch and release is cut. So your change in this PR will be reverted again when branch-0.17 is created unless you update the script to no longer overwrite the value.

These get autogenerated based on the git tag. So we don't need to use `sed` to update them. Hence we drop these version update lines.

jakirkham · 2020-08-12T01:43:08Z

Yep, that makes sense. Thanks for the explanation, Ray! 😄

Does this look right?

jakirkham · 2020-08-12T03:59:43Z

rerun tests

jakirkham · 2020-08-12T08:27:56Z

Looks like we are seeing a new CI failure. Wasn't able to reproduce the issue locally until upgrading rmm. Running the cuDF benchmark shown below (same as on CI) reproduces the issue for me. The CuPy benchmark runs without issues.

ucx-py/ci/gpu/build.sh

Line 121 in 2426b42

    
           python benchmarks/cudf-merge.py --chunks-per-dev 4 --chunk-size 10000 --rmm-init-pool-size 100

My guess is it relates to PR ( rapidsai/rmm#466 ). What I'm less sure about is whether other things need to be rebuilt (like cudf) to include that rmm change. Though this particular benchmark errors when using CuPy, which suggests the issue is more fundamental.

Process SpawnProcess-1:
Traceback (most recent call last):
  File "/opt/conda/envs/gdf/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/conda/envs/gdf/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/var/lib/jenkins/workspace/rapidsai/gpuci/ucx-py/prb/ucx-py-gpu-build_2/ucp/utils.py", line 163, in _worker_process
    ret = loop.run_until_complete(run())
  File "/opt/conda/envs/gdf/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "/var/lib/jenkins/workspace/rapidsai/gpuci/ucx-py/prb/ucx-py-gpu-build_2/ucp/utils.py", line 158, in run
    return await func(rank, eps, args)
  File "/var/lib/jenkins/workspace/rapidsai/gpuci/ucx-py/prb/ucx-py-gpu-build_2/benchmarks/cudf-merge.py", line 169, in worker
    df1 = generate_chunk(rank, args.chunk_size, args.n_chunks, "build", args.frac_match)
  File "/var/lib/jenkins/workspace/rapidsai/gpuci/ucx-py/prb/ucx-py-gpu-build_2/benchmarks/cudf-merge.py", line 114, in generate_chunk
    "key": cupy.arange(start, stop=stop, dtype="int64"),
  File "/opt/conda/envs/gdf/lib/python3.7/site-packages/cupy/creation/ranges.py", line 55, in arange
    ret = cupy.empty((size,), dtype=dtype)
  File "/opt/conda/envs/gdf/lib/python3.7/site-packages/cupy/creation/basic.py", line 22, in empty
    return cupy.ndarray(shape, dtype, order=order)
  File "cupy/core/core.pyx", line 134, in cupy.core.core.ndarray.__init__
  File "cupy/cuda/memory.pyx", line 544, in cupy.cuda.memory.alloc
  File "/opt/conda/envs/gdf/lib/python3.7/site-packages/rmm/rmm.py", line 270, in rmm_cupy_allocator
    buf = librmm.device_buffer.DeviceBuffer(size=nbytes)
  File "rmm/_lib/device_buffer.pyx", line 70, in rmm._lib.device_buffer.DeviceBuffer.__cinit__
MemoryError: std::bad_alloc: CUDA error at: ../include/rmm/mr/device/cuda_memory_resource.hpp68: cudaErrorMemoryAllocation out of memory

jakirkham · 2020-08-12T08:51:20Z

Am also able to reproduce this with the array benchmark. Just need to configure it to use RMM like so (on CI we use -o cupy instead of -o rmm). This means it is just an issue with RMM and UCX (not cuDF). Also the fact that the CuPy case (without RMM) does work rules out something has changed on the UCX side.

python benchmarks/local-send-recv.py -o rmm --server-dev 0 --client-dev 0 --reuse-alloc

Server Running at 10.33.225.165:45633
Client connecting to server at 10.33.225.165:45633
Process SpawnProcess-2:
Traceback (most recent call last):
  File "/datasets/jkirkham/miniconda/envs/rapids15dev/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/datasets/jkirkham/miniconda/envs/rapids15dev/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/datasets/jkirkham/devel/ucx-py/benchmarks/local-send-recv.py", line 147, in client
    loop.run_until_complete(run())
  File "/datasets/jkirkham/miniconda/envs/rapids15dev/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/datasets/jkirkham/devel/ucx-py/benchmarks/local-send-recv.py", line 126, in run
    t1 = np.arange(args.n_bytes, dtype="u1")
  File "/datasets/jkirkham/miniconda/envs/rapids15dev/lib/python3.8/site-packages/cupy/creation/ranges.py", line 55, in arange
    ret = cupy.empty((size,), dtype=dtype)
  File "/datasets/jkirkham/miniconda/envs/rapids15dev/lib/python3.8/site-packages/cupy/creation/basic.py", line 22, in empty
    return cupy.ndarray(shape, dtype, order=order)
  File "cupy/core/core.pyx", line 134, in cupy.core.core.ndarray.__init__
  File "cupy/cuda/memory.pyx", line 544, in cupy.cuda.memory.alloc
  File "/datasets/jkirkham/miniconda/envs/rapids15dev/lib/python3.8/site-packages/rmm/rmm.py", line 270, in rmm_cupy_allocator
    buf = librmm.device_buffer.DeviceBuffer(size=nbytes)
  File "rmm/_lib/device_buffer.pyx", line 70, in rmm._lib.device_buffer.DeviceBuffer.__cinit__
MemoryError: std::bad_alloc: CUDA error at: ../include/rmm/mr/device/cuda_memory_resource.hpp68: cudaErrorMemoryAllocation out of memory

jakirkham · 2020-08-12T21:35:06Z

Trying to address these issues in PR ( #577 ) in combination with upstream changes to RMM ( rapidsai/rmm#490 ).

jakirkham · 2020-08-13T18:37:54Z

rerun tests

Trying to pickup the new nightlies from PR ( rapidsai/rmm#493 ).

jakirkham · 2020-08-13T19:13:01Z

Looks like we are still seeing this issue with the new RMM package.

https://gpuci.gpuopenanalytics.com/blue/organizations/jenkins/rapidsai%2Fgpuci%2Fucx-py%2Fprb%2Fucx-py-gpu-build/detail/ucx-py-gpu-build/2269/pipeline

jakirkham · 2020-08-13T20:37:33Z

Am beginning to think we are just choosing a problematic default for pool size.

jakirkham · 2020-08-13T22:28:41Z

rerun tests

quasiben · 2020-08-14T14:41:23Z

rerun tests

jakirkham · 2020-08-14T16:19:21Z

rerun tests

Fix doc version

c9dfdee

This is needed to fix a lint error in the docs.

jakirkham changed the base branch from branch-0.15 to branch-0.16 August 11, 2020 22:56

jakirkham referenced this pull request Aug 11, 2020

DOC v0.16 Updates

d5ee82b

jakirkham mentioned this pull request Aug 11, 2020

Type handle_as_int as uintptr_t in signature #574

Merged

jakirkham force-pushed the fix_doc_sty branch from 597fbe0 to c9dfdee Compare August 11, 2020 22:58

jakirkham mentioned this pull request Aug 11, 2020

Always set RMM's strides in the header dask/distributed#4039

Merged

raydouglass requested changes Aug 12, 2020

View reviewed changes

Drop version update lines

054acb6

These get autogenerated based on the git tag. So we don't need to use `sed` to update them. Hence we drop these version update lines.

jakirkham requested a review from a team as a code owner August 12, 2020 01:42

raydouglass approved these changes Aug 12, 2020

View reviewed changes

Drop sed_runner as we don't use this any more

f0a0551

jakirkham mentioned this pull request Aug 12, 2020

WIP: Bypass RMM initialization #577

Closed

jakirkham mentioned this pull request Aug 13, 2020

Failure with RMM (post CNMeM deprecation) #578

Closed

jakirkham merged commit 51c84fd into rapidsai:branch-0.16 Aug 14, 2020

jakirkham deleted the fix_doc_sty branch August 14, 2020 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix doc style #575

Fix doc style #575

jakirkham commented Aug 11, 2020

jakirkham commented Aug 11, 2020

jakirkham commented Aug 12, 2020

jakirkham commented Aug 12, 2020

raydouglass left a comment

jakirkham commented Aug 12, 2020 •

edited

Loading

raydouglass commented Aug 12, 2020

jakirkham commented Aug 12, 2020

jakirkham commented Aug 12, 2020

jakirkham commented Aug 12, 2020 •

edited

Loading

jakirkham commented Aug 12, 2020 •

edited

Loading

jakirkham commented Aug 12, 2020

jakirkham commented Aug 13, 2020

jakirkham commented Aug 13, 2020 •

edited

Loading

jakirkham commented Aug 13, 2020

jakirkham commented Aug 13, 2020

quasiben commented Aug 14, 2020

jakirkham commented Aug 14, 2020

Fix doc style #575

Fix doc style #575

Conversation

jakirkham commented Aug 11, 2020

jakirkham commented Aug 11, 2020

jakirkham commented Aug 12, 2020

jakirkham commented Aug 12, 2020

raydouglass left a comment

Choose a reason for hiding this comment

jakirkham commented Aug 12, 2020 • edited Loading

raydouglass commented Aug 12, 2020

jakirkham commented Aug 12, 2020

jakirkham commented Aug 12, 2020

jakirkham commented Aug 12, 2020 • edited Loading

jakirkham commented Aug 12, 2020 • edited Loading

jakirkham commented Aug 12, 2020

jakirkham commented Aug 13, 2020

jakirkham commented Aug 13, 2020 • edited Loading

jakirkham commented Aug 13, 2020

jakirkham commented Aug 13, 2020

quasiben commented Aug 14, 2020

jakirkham commented Aug 14, 2020

jakirkham commented Aug 12, 2020 •

edited

Loading

jakirkham commented Aug 12, 2020 •

edited

Loading

jakirkham commented Aug 12, 2020 •

edited

Loading

jakirkham commented Aug 13, 2020 •

edited

Loading