Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launch Arg Builders Format Input for Launchers #620

Merged
merged 38 commits into from
Jul 27, 2024

Conversation

MattToast
Copy link
Member

@MattToast MattToast commented Jun 24, 2024

Add a LaunchArgBuilder.finalize method to the builder interface. The method takes something that can be formatted into an executable and an environment mapping and returns something that can be used as input for a launcher to start. VERY simple implementations have been added for all settings created so far.

Create a bare minimum ShellLauncher to launch settings formatted as Sequence[str] that can be sent to a Popen process. This can be used as a template to create a LocalLauncher, SlurmLauncher, etc.

Create a Dispatcher class and a global DEFAULT_DISPATCHER instance. This can be used by the experiment class to get which type of launcher are to be used for a specific LaunchSettingsBuilder by maintaining an internal mapping of settings type -> launcher type. This mapping can be built in a type safe way by calling Dispatcher.disptach. Experiments will likely be able to change how they launch each settings type by taking in a Dispatcher as a dependency, with a default being the DEFAULT_DISPATCHER. e.g.

class Experiment:
    ...
    def start(
        self,
        job: Job,
        *jobs: Job,
        dispatcher: Dispatcher = DEFAULT_DISPATCHER
    ) -> LaunchedJobIDs: ...
    ...

@MattToast MattToast added type: design Issues related to architecture and code design area: settings Issues related to Batch or Run settings area: launcher Issues related to any of the launchers within SmartSim area: api Issues related to API changes ignore-for-release labels Jun 24, 2024
@MattToast MattToast self-assigned this Jun 24, 2024
@MattToast MattToast force-pushed the builders-fmt branch 2 times, most recently from 396306f to cabc2a0 Compare June 26, 2024 22:39
@MattToast MattToast force-pushed the smartsim-refactor branch 2 times, most recently from 8a3889e to d15e951 Compare June 26, 2024 22:46
Copy link

codecov bot commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 62.08333% with 91 lines in your changes missing coverage. Please review.

Please upload report for BASE (smartsim-refactor@9b511f1). Learn more about missing BASE report.

Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                  @@
##             smartsim-refactor     #620   +/-   ##
====================================================
  Coverage                     ?   34.24%           
====================================================
  Files                        ?      108           
  Lines                        ?     6460           
  Branches                     ?        0           
====================================================
  Hits                         ?     2212           
  Misses                       ?     4248           
  Partials                     ?        0           
Files Coverage Δ
smartsim/_core/config/config.py 62.85% <100.00%> (ø)
smartsim/error/errors.py 59.18% <100.00%> (ø)
smartsim/settings/arguments/__init__.py 100.00% <100.00%> (ø)
smartsim/settings/arguments/batch/__init__.py 100.00% <100.00%> (ø)
smartsim/settings/arguments/batchArguments.py 76.92% <100.00%> (ø)
smartsim/settings/arguments/launch/__init__.py 100.00% <100.00%> (ø)
smartsim/settings/arguments/launch/local.py 48.38% <100.00%> (ø)
smartsim/types.py 100.00% <100.00%> (ø)
smartsim/_core/utils/helpers.py 33.00% <75.00%> (ø)
smartsim/settings/arguments/launch/dragon.py 61.11% <75.00%> (ø)
... and 8 more

@MattToast MattToast force-pushed the smartsim-refactor branch from d15e951 to e9d8eca Compare June 26, 2024 22:57
Copy link
Contributor

@mellis13 mellis13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just submitting some comments and questions. We discussed offline about the refactor the arg builder, and that feedback was not included in my comments.

Copy link
Contributor

@amandarichardsonn amandarichardsonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions on Questions on Questions, but awesome work! I am learning a lot

from smartsim.experiment import Experiment
from smartsim.settings.arguments import LaunchArguments

_Ts = TypeVarTuple("_Ts")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these TypeVars, are they just existing 'for now'?

Copy link
Member Author

@MattToast MattToast Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! We need them to keep our classes/methods/functions generic.

Old school python typing requires that generic types be declared at (iirc) the top level of a python module. You can think of this python function

class SupportsAdd(Protocol):
    def __add__(self, other: Self) -> Self: ...

T = TypeVar("T", bound=SupportsAdd)

def my_add(a: T, b: T) -> T:
    return a + b

as roughly equivalent to this in C++

#template<typename T>
T my_add(T a, T b) {
    return a + b
}

This is syntax is quite unintuitive, imo, but luckily was changed in 3.12 to this!

def my_add[T: SupportsAdd](a: T, b: T) -> T:
    return a + b

which, again imo, is much more readable and simple to understand! We just cannot use it yet for backward compatibility with python <3.12 😢

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, apparently mypy doesn't fully support the syntax yet, lol

return None
return register

def get_dispatch(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why accept two different types for args when it will always be of type LaunchArguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly just convenience! that way if I have

args: LaunchArguments = ...

I can call

dispatcher.get_dispatch(args)

instead of

dispatcher.get_dispatch(type(args))

If you think this is more confusing than it is helpful, I'd be willing to remove it!

# ---------------------------------------------------------------------
exe = t.cast("ExecutableProtocol", job.entity)
# <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
dispatch = dispatcher.get_dispatch(args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so I am trying to understand this, start_jobs accepts a Dispatcher and defaults to a Dispatcher you created in dispatch.py where the Dispatch init accepts the type of LaunchArgument child class. But you are not passing in the LaunchArgument type when initializing the Dispatcher so how can you run .get_dispatch when the self._dispatch_registry seems like it would be empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you are not passing in the LaunchArgument type when initializing the Dispatcher so how can you run .get_dispatch when the self._dispatch_registry seems like it would be empty?

Absolutely correct BUT the dispatch function in smartsim.settings.dispatch is actually a pointer to the DEFAULT_DISPATHCER.dispatch method. That way when a user creates a new settings and python loads the definition of the LaunchArguments subclass it will automatically be added in the DEFAULT_DISPATCHER._dispatch_registry

from smartsim.settings.dispatch import dispatch
from smartsim.settings.arguments.launchArguments import LaunchArguments

@dispatch(...)  # <-- Added into the registry here
                #     before the `__init__` method is ever even called
class MyLaunchArguments(LaunchArguments):
     ...

# it easier to monitor job statuses
# pylint: disable-next=protected-access
self._active_launchers.add(launch_config._adapted_launcher)
return launch_config.start(exe, env)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My brain might explode, but that is okay. So here, you're calling .start but where is make_shell_format_fn actually being ran? I see it set in each launch file? I know its being run?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_shell_format_fn is being called when smartsim.settings.arguments.launch.alps, smartsim.settings.launch.mpi, etc. are loaded by the interpreter and the function that is returned from that call is added into the default dispatcher's dispatch registry.

The dispatch registration then passes it (the make_shell_format_fn returned function) into a launcher adapter where it is finally called when the LauncherAdapter.start method is finally called (at that point it is at the self._adapt instance var)!

Copy link
Contributor

@amandarichardsonn amandarichardsonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hark, fellow coder! Behold this wondrous creation, forged in the fires of logic and threaded with the silken strands of syntax. With a flourish of my digital wand, I hereby grant my blessing upon this noble pull request!

exp_path: t.Optional[str] = None,
launcher: str = "local",
):
def __init__(self, name: str, exp_path: str | None = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a significant diff between types exp_path: t.Optional[str] = None and exp_path: str | None = None? Or is this just personal preference

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are 100% equivalent! I just like the latter because:

  1. It's slightly less to type, lol
  2. I think T | None does a better job of communicating that this parameter is nullable than Optional[T]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! One last caveat!!

The latter syntax is only available with Python >= 3.10 or with the from __future__ import annotations flag for the interpreter. But I'm also on a bit of a mission to start moving more of our modules over to using from __future__ import annotations anyways!

@@ -71,6 +85,15 @@ def format_comma_sep_env_vars(
Slurm takes exports in comma separated lists
the list starts with all as to not disturb the rest of the environment
for more information on this, see the slurm documentation for srun

.. warning::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance that we will forget about this comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry! I will make a ticket for this/other methods!

from smartsim.settings.arguments import LaunchArguments

_Ts = TypeVarTuple("_Ts")
_T_contra = t.TypeVar("_T_contra", contravariant=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you by chance miss this one? _T_contra?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhh, I can try, but it truly is "a type variable that can bind to anything and is contravariant"... There's not much more context that I can add :(

@MattToast MattToast merged commit bd21957 into CrayLabs:smartsim-refactor Jul 27, 2024
32 of 33 checks passed
@MattToast MattToast deleted the builders-fmt branch August 9, 2024 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: api Issues related to API changes area: launcher Issues related to any of the launchers within SmartSim area: settings Issues related to Batch or Run settings ignore-for-release type: design Issues related to architecture and code design
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants