Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a SLURMRunner #1

Closed
jacobtomlinson opened this issue Sep 21, 2023 · 2 comments
Closed

Add a SLURMRunner #1

jacobtomlinson opened this issue Sep 21, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@jacobtomlinson
Copy link
Owner

This runner could use the environment variables set in each process for self identification and a shared filesystem for communication.

Each process can use the SLURM_NODEID for it's ID (a unique monotonic index that is assigned to each process) and the SLURM_JOB_NUM_NODES to know the total number of processes.

  • The process with rank 0 assumes it is the scheduler, it writes a scheduler file to the shared filesystem.
  • The process with rank 1 assumes it should run the client code, it waits for the scheduler file to exist and then continues running the contents of the context manager.
  • All processes with rank 2 and above assume they are workers, they wait for the scheduler file to exist and then start worker processes that connect to the scheduler.
@jacobtomlinson jacobtomlinson added the enhancement New feature or request label Sep 21, 2023
@lgarrison
Copy link
Contributor

I'd be interested in this functionality, could I help out? Is the project ready for contributions? I'm no Dask expert but I think I understand the relevant pieces here.

(As a minor point, I think the relevant Slurm variables to index processes are actually SLURM_PROCID and SLURM_NTASKS. Ref.)

@jacobtomlinson
Copy link
Owner Author

Closed in #3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants