You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The function used by flux-core to "discover" a local Rv1 via hwloc indexes GPU devices via the following:
char*rhwloc_gpu_idset_string (hwloc_topology_ttopo)
{
intindex;
char*result=NULL;
hwloc_obj_tobj=NULL;
structidset*ids=idset_create (0, IDSET_FLAG_AUTOGROW);
if (!ids)
returnNULL;
/* Manually index GPUs -- os_index does not seem to be valid for * these devices in some cases, and logical index also seems * incorrect (?) */index=0;
while ((obj=hwloc_get_next_osdev (topo, obj))) {
/* Only count cudaX and openclX devices for now */constchar*s=hwloc_obj_get_info_by_name (obj, "Backend");
if (s&& ((strcmp (s, "CUDA") ==0) || (strcmp (s, "OpenCL") ==0)))
idset_set (ids, index++);
}
if (idset_count (ids) >0)
result=idset_encode (ids, IDSET_FLAG_RANGE);
idset_destroy (ids);
returnresult;
}
However, this will number GPU devices logically when CUDA_VISIBLE_DEVICES is set, e.g. for CUDA_VISIBLE_DEVICES=1"gpu": "0" will be set in the resulting R object since there will be only one GPU discovered by hwloc:
$ src/cmd/flux R encode --local | src/cmd/flux R decode --shortrank0/core[0-35],gpu[0-1]
$ CUDA_VISIBLE_DEVICES=1 src/cmd/flux R encode --local | src/cmd/flux R decode --shortrank0/core[0-35],gpu0
$ CUDA_VISIBLE_DEVICES=0 src/cmd/flux R encode --local | src/cmd/flux R decode --shortrank0/core[0-35],gpu0
A fix would be to check for CUDA_VISIBLE_DEVICES in the environment, and if set, grab the GPU device index from that variable (treated as an idset), rather than using an arbitrary logical index. This would instead produce:
$ CUDA_VISIBLE_DEVICES=1 src/cmd/flux R encode --local | src/cmd/flux R decode --short
rank0/core[0-35],gpu1
This issue is perhaps academic, since flux-core doesn't have any way currently to schedule gpus, and fluxion will discover these resources separately on its own. However, it would be nice for testing if the discovery worked correctly here.
The text was updated successfully, but these errors were encountered:
Yeah, I think it might be even more of an issue now. If Flux is launched by a foreign RM (so that the instance is doing resource discovery via hwloc) with CUDA_VISIBLE_DEVICES set, now even Fluxion will see the logical GPU device ids instead of physical, which could conceivably result in multiple jobs scheduled to the same GPU.
I'm struggling to decide if this is a practical problem though. I think there would only be an issue if multiple Flux instances were scheduled to the same node, each with a subset of GPUs assigned (which seems an unlikely case). In any other case, I think Flux components will be using R from the parent so each will have different GPU ids assigned...
For the foreign RM issue, I wonder if this should be solve by some documentation. "To ensure correctness, the foreign RM must use a more deterministic GPU numbering such as CUDA_DEVICE_ORDER=PCI_BUS_ID."
The function used by flux-core to "discover" a local Rv1 via hwloc indexes GPU devices via the following:
However, this will number GPU devices logically when CUDA_VISIBLE_DEVICES is set, e.g. for
CUDA_VISIBLE_DEVICES=1
"gpu": "0"
will be set in the resultingR
object since there will be only one GPU discovered by hwloc:A fix would be to check for
CUDA_VISIBLE_DEVICES
in the environment, and if set, grab the GPU device index from that variable (treated as an idset), rather than using an arbitrary logical index. This would instead produce:This issue is perhaps academic, since flux-core doesn't have any way currently to schedule gpus, and fluxion will discover these resources separately on its own. However, it would be nice for testing if the discovery worked correctly here.
The text was updated successfully, but these errors were encountered: