-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SmartSim Singularity Integration #204
Conversation
--- Signed-off-by: Ben Albrecht <[email protected]>
--- Signed-off-by: Ben Albrecht <[email protected]>
--- Signed-off-by: Ben Albrecht <[email protected]>
--- Signed-off-by: Ben Albrecht <[email protected]>
Codecov Report
@@ Coverage Diff @@
## develop #204 +/- ##
========================================
Coverage 84.18% 84.18%
========================================
Files 58 59 +1
Lines 3244 3301 +57
========================================
+ Hits 2731 2779 +48
- Misses 513 522 +9
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice PR @ben-albrecht . Super excited for this functionality to be in play. There's a few cosmetic changes with the only comment of substance being to limit the WLM to slurm for now OR test to make sure it works on PBS.
I'll submit approval for now so that you can merge in as you want after you add the tests. Please do make sure that the CI tests before merging in.
--- Signed-off-by: Ben Albrecht <[email protected]>
--- Signed-off-by: Ben Albrecht <[email protected]>
--- Signed-off-by: Ben Albrecht <[email protected]>
--- Signed-off-by: Ben Albrecht <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ben-albrecht for the changes and for the thoroughness of the tests and implementation. The PR description is also very helpful for us going forward to understand how and why some of the implementation deviated from the original design decisions. They are all fully justified and make sense to me.
This PR adds the ability for SmartSim to launch workloads in Singularity (Apptainer) as described in #101. It also lays the groundwork for supporting other container runtimes such as docker, shifter, and podman in the future, as well as launching the orchestrator in a container.
Design Variations
During development, it became clear that a few design changes from the original proposal were required. I have documented them below with their rationale:
1. Argument name:
bind_paths
->mount
The terms bind path and mount are mostly used interchangeably across different container runtimes. When writing tests, I kept forgetting if it was
bind_path
orbind_paths
, which hinted to me that it was not a great arg name, so I swapped it out for the more succinct and easy to remembermount
.2.
create_run_settings(..., container: str)
->create_run_settings(..., container: Container)
We originally thought it would be easier for users to get started with containers by allowing them to pass a string into
create_run_settings(container='singularity')
instead of having to create a Container object. While writing tests, I realized that this was potentially very confusing for users because 1) thecontainer
arg types change betweencreate_run_settings
andRunSettings
, and 2) if you need to add other metadata such as container args or container mount paths, you have to switch from usingcreate_run_settings
toRunSettings
in your code, which is very annoying. Because creating Containers is so lightweight, I think it is best to keep the container interface consistent across all functions that accept them.3. dropped getter/setter methods
Because command generation and validation happens upon execution, users can freely modify
Container.args
andContainer.mount
without getter/setter methods to update any other state. Therefore, I dropped these methods from the interface.Testing
Added 2 test suites for containers: One for WLM testing and one for local testing. The local testing suite runs in GitHub actions. Due to the added time from building Singularity and pulling the
container-testing
image, the singularity testing only happens on one configuration of the build matrix: python 3.9 + redisai 1.2.5 on linux