Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpiCheckPhaseHook: add parameters to bypass errors in sandbox #350112

Merged
merged 4 commits into from
Oct 21, 2024

Conversation

qbisi
Copy link
Contributor

@qbisi qbisi commented Oct 20, 2024

Motivation

OpenMPI get confused by the sandbox environment and spew errors like this (both to stdout and stderr):

      [hwloc/linux] failed to find sysfs cpu topology directory, aborting linux discovery.
      [1729458724.473282] [localhost:78   :0]       tcp_iface.c:893  UCX  ERROR scandir(/sys/class/net) failed: No such file or directory

These messages contaminate test output, which makes the difftest to fail. See nixpkgs#petsc.

The old petsc package use a hardcoded patch to sed the outputs of the test result.
The new solution is to to use a preset cpu topology file and disable ucx model.
(Note: I have tried to set UCX_TLS=sm or UCX_TLS=self, but openucx insist to scan the sysfs directory "/sys/class/net")

# Disable sysfs cpu topology directory discovery.
export PRTE_MCA_hwloc_use_topo_file="@topology@"
# Use the network model ob1 instead of ucx. 
export OMPI_MCA_pml=ob1  

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@nix-owners nix-owners bot requested a review from Ericson2314 October 20, 2024 23:18
@qbisi qbisi changed the title Openmpi related checkPhaseHook mpiCheckPhaseHook: add parameters to bypass errors in sandbox Oct 20, 2024
@ofborg ofborg bot added the 8.has: package (new) This PR adds a new package label Oct 21, 2024
Copy link
Member

@markuskowa markuskowa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@markuskowa markuskowa merged commit c4875a4 into NixOS:master Oct 21, 2024
35 of 36 checks passed
@qbisi qbisi deleted the dev-hpc branch October 27, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants