-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask LocalCudaCluster compute error when threads_per_worker
not equal to 1
#1262
Comments
Using more than one thread per worker (default) is not something we recommend or officially support for If you're doing that to ensure you have multiple threads for CPU compute resources, you could consider launching a hybrid cluster with proper resource and code annotations. |
Thanks for letting me know @pentschev . The reason for me to use multi-thread is I think multi-threads can accelerate the IO to disk since zarr is compressed and chunked storage (I do have tested it). I don't need use multi-threads for the calculation. Do you have any suggestions for me? I have looked at kwikio but dask doesn't support it now. |
You should be able to use import cupy
import dask.array
from dask.distributed import Client
from dask_cuda import LocalCUDACluster
def main():
import kvikio.zarr
filepath = "./a.zarr"
# prepare the data
z = kvikio.zarr.open_cupy_array(store=filepath, mode="w", shape=(10,), chunks=(2,))
z[:] = cupy.arange(10)
# load the zarr array into a dask array
a = dask.array.from_array(z, chunks=z.chunks)
# as this point, it works as a regular dask array (backed by cupy.ndarray)
assert a.sum().compute() == 45
if __name__ == "__main__":
with LocalCUDACluster(n_workers=1) as cluster:
with Client(cluster):
main() We should properly mention this in the KvikIO docs :) |
Please note that in the sample above z = kvikio.zarr.open_cupy_array(store=filepath, mode="r")
a = dask.array.from_array(z, chunks=z.chunks) In that case you'll map |
Thanks. I will try it later. |
I find a weird error with LocalCUDACluster. My workflow is use dask to load data from zarr, then transfer to GPU memory, do some computation with multi-GPU, transfer the result back to CPU memory and finally save to zarr.
A minimum code to reproduce:
It sometimes will generate:
However, sometimes it just works perfect.
I find when I set
threads_per_worker=1
. This error always disappear.I also asked a similar question on Dask forum: https://dask.discourse.group/t/dask-localcudacluster-compute-error-when-threads-per-worker-not-equal-to-1/2284/1
The text was updated successfully, but these errors were encountered: