Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
restoring more hwloc --cpu-set behavior
I made an older checkin bdd92a7 that partially restored --cpu-set behavior using OMPI 3.x code, and I was about to restore more code from 3.x, but I'm taking a different approach here and reverting the previous addition of 3.x code too. For the (partial) revert: commit: bdd92a7. title: "-cpu-set as a constraint rather than as a binding" I'm reverting the functional changes to how the hwloc tree is iterated over, but keeping the error-detection changes modeled after 3.x. For the new approach to --cpu-set: The --cpu-set option is logically similar to running under a cgroup just without the OS-level enforcement that comes with a cgroup. For cgroups the new code loads the topology without the WHOLE_SYSTEM flag so the tree only contains what's in the cgroup. We can do the same thing with a hwloc_restrict_topology() call to constrain the topology to whatever --cpu-set the user enters. Then all the 3.x code that skips over unavailable tree entries becomes unnecessary. Here are the cases that were fixed in bdd92a7, and which are confirmed to still work with the new approach: hardware: [..../..../..../....] numbered sequentially 0-15 % mpirun -np 2 --report-bindings --bind-to hwthread \ --map-by hwthread --cpu-set 6,7 hostname > MCW rank 0 [..../..B./..../....] > MCW rank 1 [..../...B/..../....] % mpirun -np 2 --report-bindings --bind-to hwthread \ --map-by ppr:2:node:pe=2 --cpu-set 6,7,12,13 hostname > MCW rank 0 [..../..BB/..../....] > MCW rank 1 [..../..../..../BB..] The additional case I was in progress to fix here and which is also handled by the new approach is % mpirun -np 2 --report-bindings --bind-to hwthread \ --map-by ppr:2:node:pe=3 --cpu-set 4,5,9,11,14,15 ./x > MCW rank 0 [..../BB../.B../....] > MCW rank 1 [..../..../...B/..BB] Signed-off-by: Mark Allen <[email protected]> [ This checkin also includes a squashed commit from Ralph Castain to fix code style violations. ]
- Loading branch information