You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried setting the LocalCUDACluster using ucx but get this ModuleNotFoundError: No module named 'ucp' I'm running this in NVTabular merlin-pytorch-training:0.5.3 NGC container. I thought that container had everything we need for UCX.
Here's my code:
protocol = "ucx"
# Select GPUs to place workers. Here 1st and 2nd GPU
# If you want the first 4 GPUs it would be 0,1,2,3 and so on
visible_devices = "0,1"
# Get the IP Address
cmd = "hostname --all-ip-addresses"
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
IPADDR = str(output.decode()).split()[0]
cluster = LocalCUDACluster(
ip=IPADDR,
protocol=protocol,
enable_nvlink=True,
CUDA_VISIBLE_DEVICES=visible_devices,
local_directory=dask_workdir,
device_memory_limit=0.8 # This can be changed depending on your workflow
)
# Create the distributed client
client = Client(cluster)
client
I tried setting the LocalCUDACluster using ucx but get this ModuleNotFoundError: No module named 'ucp' I'm running this in NVTabular merlin-pytorch-training:0.5.3 NGC container. I thought that container had everything we need for UCX.
Here's my code:
Here's the error traceback:
The text was updated successfully, but these errors were encountered: