Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librlist: rlist_from_hwloc() encodes logical instead of physical GPUs #3352

Open
grondo opened this issue Nov 19, 2020 · 3 comments
Open

librlist: rlist_from_hwloc() encodes logical instead of physical GPUs #3352

grondo opened this issue Nov 19, 2020 · 3 comments

Comments

@grondo
Copy link
Contributor

grondo commented Nov 19, 2020

The function used by flux-core to "discover" a local Rv1 via hwloc indexes GPU devices via the following:

char * rhwloc_gpu_idset_string (hwloc_topology_t topo)
{
    int index;
    char *result = NULL;
    hwloc_obj_t obj = NULL;
    struct idset *ids = idset_create (0, IDSET_FLAG_AUTOGROW);

    if (!ids)
        return NULL;

    /*  Manually index GPUs -- os_index does not seem to be valid for
     *  these devices in some cases, and logical index also seems
     *  incorrect (?)
     */
    index = 0;
    while ((obj = hwloc_get_next_osdev (topo, obj))) {
        /* Only count cudaX and openclX devices for now */
        const char *s = hwloc_obj_get_info_by_name (obj, "Backend");
        if (s && ((strcmp (s, "CUDA") == 0) || (strcmp (s, "OpenCL") == 0)))
            idset_set (ids, index++);
    }
    if (idset_count (ids) > 0)
        result = idset_encode (ids, IDSET_FLAG_RANGE);
    idset_destroy (ids);
    return result;
}

However, this will number GPU devices logically when CUDA_VISIBLE_DEVICES is set, e.g. for CUDA_VISIBLE_DEVICES=1 "gpu": "0" will be set in the resulting R object since there will be only one GPU discovered by hwloc:

$ src/cmd/flux R encode --local | src/cmd/flux R decode --short
rank0/core[0-35],gpu[0-1]
$ CUDA_VISIBLE_DEVICES=1 src/cmd/flux R encode --local | src/cmd/flux R decode --short
rank0/core[0-35],gpu0
$ CUDA_VISIBLE_DEVICES=0 src/cmd/flux R encode --local | src/cmd/flux R decode --short
rank0/core[0-35],gpu0

A fix would be to check for CUDA_VISIBLE_DEVICES in the environment, and if set, grab the GPU device index from that variable (treated as an idset), rather than using an arbitrary logical index. This would instead produce:

$ CUDA_VISIBLE_DEVICES=1  src/cmd/flux R encode --local | src/cmd/flux R decode --short
rank0/core[0-35],gpu1

This issue is perhaps academic, since flux-core doesn't have any way currently to schedule gpus, and fluxion will discover these resources separately on its own. However, it would be nice for testing if the discovery worked correctly here.

@garlick
Copy link
Member

garlick commented Apr 5, 2022

Is this still an issue? Since fluxion now uses Rv1 rather than hwloc, the "academic" comment at the end might no longer apply...

@grondo
Copy link
Contributor Author

grondo commented Apr 5, 2022

Yeah, I think it might be even more of an issue now. If Flux is launched by a foreign RM (so that the instance is doing resource discovery via hwloc) with CUDA_VISIBLE_DEVICES set, now even Fluxion will see the logical GPU device ids instead of physical, which could conceivably result in multiple jobs scheduled to the same GPU.

I'm struggling to decide if this is a practical problem though. I think there would only be an issue if multiple Flux instances were scheduled to the same node, each with a subset of GPUs assigned (which seems an unlikely case). In any other case, I think Flux components will be using R from the parent so each will have different GPU ids assigned...

@dongahn
Copy link
Member

dongahn commented Apr 6, 2022

At some point, flux-framework/flux-sched#926 (comment) needs to be ported to rv1exec reader so that we get some coverage in this space.

For the foreign RM issue, I wonder if this should be solve by some documentation. "To ensure correctness, the foreign RM must use a more deterministic GPU numbering such as CUDA_DEVICE_ORDER=PCI_BUS_ID."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants