Support Windows Process Groups #723

tdaede · 2020-02-07T17:59:36Z

To use more than 64 threads on Windows, you have to use Process Groups:

https://docs.microsoft.com/en-us/windows/win32/procthread/processor-groups?redirectedfrom=MSDN

Each spawned thread must be assigned to a process group. I don't know if it makes sense to handle this in rayon or libstd or what.

cuviper · 2020-02-07T18:36:29Z

Do you know whether this is reflected in num_cpus?

If that will be limited to the 64-thread ceiling already, then I think we're in okay shape by default. We wouldn't want to oversubscribe too many rayon threads into a 64-cpu process group, but I think it's fine if we're limited to that and just don't use additional cpus.

Beyond that, I think it's in the realm of advanced tweaking that the user could deal with in ThreadPoolBuilder::spawn_handler.

See also #319 for general NUMA awareness.

tdaede · 2020-02-07T22:24:43Z

Looking at the source code of num_cpus, I believe it will max out at 64. You'd need to use GetActiveProcessorCount to get a higher number. So it will work fine as is, just underutilize.

Note that this is technically unrelated to NUMA - in particular the just-released 3990WX has 128 threads but only one NUMA node.

shuffle2 · 2020-02-12T20:06:32Z

the numa-related parts of chapter 4 in https://developer.amd.com/wp-content/resources/56782_1.0.pdf are relevant:

Since all the processors in a single-socket/128-logical-processor NUMA node cannot fit completely within a single Windows Processor Group, Windows creates a (virtual) secondary node to hold the additional processors.

Regardless of NPS settings, applications will need to be multi-group aware to take advantage of all the processors (otherwise their affinity will be to a single processor group).

i.e. the "NUMA node"/"processor group" Windows terminology is becoming blurred, as it is imposing limitations which don't reflect the hardware or the configuration of the hardware...

shuffle2 · 2020-02-13T19:11:37Z

...this means your code needs to manually move threads to other groups via something like

GROUP_AFFINITY affinity{}, affinity_prev{};
affinity.Group = id / MAXIMUM_PROC_PER_GROUP;
affinity.Mask = 1ull << (id % MAXIMUM_PROC_PER_GROUP);
SetThreadGroupAffinity(GetCurrentThread(), &affinity, &affinity_prev);

where the important part is .Group. AFAIK there is no "group mask" field to tell the scheduler to schedule a thread across a set of processor groups(?)
Meaning, just creating 128 threads would not be enough.

retep998 mentioned this issue Jul 20, 2020

Add std::thread::available_concurrency rust-lang/rust#74480

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Windows Process Groups #723

Support Windows Process Groups #723

tdaede commented Feb 7, 2020

cuviper commented Feb 7, 2020

tdaede commented Feb 7, 2020

shuffle2 commented Feb 12, 2020

shuffle2 commented Feb 13, 2020 •

edited

Loading

Support Windows Process Groups #723

Support Windows Process Groups #723

Comments

tdaede commented Feb 7, 2020

cuviper commented Feb 7, 2020

tdaede commented Feb 7, 2020

shuffle2 commented Feb 12, 2020

shuffle2 commented Feb 13, 2020 • edited Loading

shuffle2 commented Feb 13, 2020 •

edited

Loading