Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of multi-threaded computations on multiple nodes #7

Closed
felixcremer opened this issue Nov 2, 2021 · 3 comments
Closed

Handling of multi-threaded computations on multiple nodes #7

felixcremer opened this issue Nov 2, 2021 · 3 comments

Comments

@felixcremer
Copy link

In my use case I would like to mix multi threaded and distributed computing on multiple nodes with these slurm allocations:

#SBATCH --nodes=6         
#SBATCH --ntasks=6        
#SBATCH --cpus-per-task=20

Currently I am adding process with the following to enable multi threading on the different nodes:

using SlurmClusterManager
addprocs(SlurmManager(), env=["JULIA_NUM_THREADS"=>ENV["SLURM_CPUS_PER_TASK"]])

This works and I am wondering, whether this is a general enough use case that it should be incorporated into SlurmClusterManager?
We could start the different julia processes with the number of threads set to SLURM_CPUS_PER_TASK.
So that the user doesnt have to specify the use of multi threading in the slurm file and in the julia script.
If you think, that this is a useful contribution to this package, I could prepare a pull request.

@kleinhenz
Copy link
Collaborator

That looks useful! Pull requests are definitely welcome.

@jonas-schulze
Copy link

jonas-schulze commented Nov 14, 2021

I have a similar use case and considered opening an issue as well, but I eventually didn't because Julia threads are independent from BLAS threads. Depending on which ones you need, you have to set them up manually. Automatically setting Julia threads has a danger of oversubscription. Consider

# threads.jl
using Distributed, SlurmClusterManager

addprocs(SlurmManager())

@everywhere using Base.Threads, LinearAlgebra
@everywhere println((id=myid(), nt=nthreads(), bt=BLAS.get_num_threads()))

on an allocation with --nodes=3 and --cpus-per-task=2. By default, all processes use 1 Julia thread and only the first process the "proper" amount of BLAS threads:

$ julia threads.jl
(id = 1, nt = 1, bt = 2)
(id = 4, nt = 1, bt = 1)
(id = 3, nt = 1, bt = 1)
(id = 2, nt = 1, bt = 1)

Propagating the number of Julia threads is easy:

$ JULIA_NUM_THREADS=$SLURM_CPUS_PER_TASK julia threads.jl
(id = 1, nt = 2, bt = 2)
(id = 4, nt = 2, bt = 1)
(id = 3, nt = 2, bt = 1)
(id = 2, nt = 2, bt = 1)

Propagating BLAS threads not so much:

$ OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK julia threads.jl
(id = 1, nt = 1, bt = 2)
(id = 4, nt = 1, bt = 1)
(id = 3, nt = 1, bt = 1)
(id = 2, nt = 1, bt = 1)

To do that, you have to call LinearAlgebra.BLAS.set_num_threads(n) with the proper amount n::Int, for my use case all the SLURM_CPUS_PER_TASK. Using n = nothing does not work, it would utilize the whole machine instead of the specified allocation.

Given it's so easy to specify the number of Julia threads to be used for all the workers using the jobfile (export JULIA_NUM_THREADS=$SLURM_CPUS_PER_TASK), I would not add this feature, or at least make it off by default.

@kleinhenz
Copy link
Collaborator

That makes sense to me. I don't want to try to be too smart here. Going to close for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants