Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Default docker run --shm-size=64MB inadequate for some Intel MPI jobs #8

Closed
jhale opened this issue Dec 13, 2016 · 2 comments
Closed
Labels

Comments

@jhale
Copy link

jhale commented Dec 13, 2016

When using the combined shm:dapl Intel MPI fabrics the /dev/shm device is exposed through to the Docker container from the host. It is then used for MPI communications intra-node. Unfortunately, the default size of /dev/shm is restricted to 64MB and this is inadequate. The result is MPI applications that crash at random points.

Fix:

In jobs.json set the additional_docker_run_options to be --shm-size=256m (or as appropriate).

https://github.com/Azure/batch-shipyard/blob/master/config_templates/jobs.json

This is more of a suggestion/warning than a bug report.

Joint work with @chrisrichardson.

@alfpark
Copy link
Collaborator

alfpark commented Dec 13, 2016

@jhale Thanks for the information. I'll add this as a note in the documentation. Please let me know if you feel this is insufficient and a proper configuration option is a better alternative.

@jhale
Copy link
Author

jhale commented Dec 15, 2016

64m is unreasonably low in the context of a physical host that only runs one container. I think this should be made a parameter and upped to something more reasonable by default, as you've done in your commit.

Feel free to close the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants