Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple Status Checking #655

Merged
merged 9 commits into from
Aug 9, 2024

Conversation

MattToast
Copy link
Member

Add ability for the Experiment to fetch the status a launched job that it started given a LaunchedJobID. Teach the ShellLauncher and DragonLauncher to get statuses of jobs they have launched.

@MattToast MattToast added type: feature Issues that include feature request or feature idea area: launcher Issues related to any of the launchers within SmartSim area: api Issues related to API changes ignore-for-release labels Aug 2, 2024
@MattToast MattToast self-assigned this Aug 2, 2024
@@ -144,6 +145,10 @@ def start(
res = _assert_schema_type(self._connector.send_request(req), DragonRunResponse)
return LaunchedJobID(res.step_id)

def get_status(self, *launched_ids: LaunchedJobID) -> tuple[SmartSimStatus, ...]:
infos = self._get_managed_step_update(list(launched_ids))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infos plural intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infos, plural intentional! The return type of _get_managed_step_update is list[StepInfo], so I pluralized as a reminder that this is a "collection of many StepInfo instances".

If its unclear or confusing to read, I'm more than willing to rename to info_list or similar!

Comment on lines 391 to 394
@abc.abstractmethod
def get_status(
self, *launched_ids: LaunchedJobID
) -> tuple[SmartSimStatus, ...]: ...
Copy link
Member Author

@MattToast MattToast Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this assumes with this call:

stat_1, stat_2, stat_3, ... = launcher.get_status(id_1, id_2, id_3, ...)

stat_1 is the status of the job with id id_1, stat_2 is the status of the job with id id_2, etc. and is behavior that is relied upon by the Experiment.

Do we like this or is this too much of an "implied constraint" from the protocol?

Should the return type here actually be a Mapping[LaunchJobID, SmartSimStatus] or maybe a Iterable[tuple[LaunchJobID, SmartSimStatus]]? That way its more obvious to users looking to write and register their own launchers the intention of what this method should return.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it might not be implied for all SS users that stat_1 and stat_2 will map to id_1 and id_2 but I might be wrong. I would rather get back a Mapping or Iterable -> Im wondering what will be easier for the user to inspect, maybe an iterable since you can use a for loop? But then again I can search via key in a Mapping - am I on to something here or is this so far off?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed offline to make this protocol method return a Mapping.

Copy link
Contributor

@amandarichardsonn amandarichardsonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, just a couple of comments

:returns: A tuple of statuses with order respective of the order of the
calling arguments.
"""
ids_ = set(ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind commenting this code? Sorry I always ask that MERP

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely!

Sorry I always ask that MERP

Don't be sorry! If its not immediately obvious what's happening, that's a good sign comments are needed. I don't want to over comment, so I only tend to comment sections that reviewers ask for more details on!!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically I'm just offloading the work of figuring out what needs comments onto you, the reviewer, lol.

Comment on lines 391 to 394
@abc.abstractmethod
def get_status(
self, *launched_ids: LaunchedJobID
) -> tuple[SmartSimStatus, ...]: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it might not be implied for all SS users that stat_1 and stat_2 will map to id_1 and id_2 but I might be wrong. I would rather get back a Mapping or Iterable -> Im wondering what will be easier for the user to inspect, maybe an iterable since you can use a for loop? But then again I can search via key in a Mapping - am I on to something here or is this so far off?

Copy link
Contributor

@mellis13 mellis13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very small comments otherwise LGTM

Copy link

codecov bot commented Aug 8, 2024

Codecov Report

Attention: Patch coverage is 39.02439% with 50 lines in your changes missing coverage. Please review.

Please upload report for BASE (smartsim-refactor@a2c1251). Learn more about missing BASE report.

Files with missing lines Patch % Lines
smartsim/_core/control/launch_history.py 43.47% 13 Missing ⚠️
smartsim/settings/dispatch.py 38.88% 11 Missing ⚠️
smartsim/experiment.py 33.33% 10 Missing ⚠️
smartsim/_core/utils/helpers.py 27.27% 8 Missing ⚠️
smartsim/_core/launcher/dragon/dragonLauncher.py 22.22% 7 Missing ⚠️
smartsim/_core/control/jobmanager.py 50.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                  @@
##             smartsim-refactor     #655   +/-   ##
====================================================
  Coverage                     ?   34.15%           
====================================================
  Files                        ?      109           
  Lines                        ?     6611           
  Branches                     ?        0           
====================================================
  Hits                         ?     2258           
  Misses                       ?     4353           
  Partials                     ?        0           
Files with missing lines Coverage Δ
smartsim/_core/launcher/dragon/dragonBackend.py 0.00% <ø> (ø)
smartsim/error/errors.py 60.00% <100.00%> (ø)
smartsim/status.py 100.00% <100.00%> (ø)
smartsim/_core/control/jobmanager.py 23.22% <50.00%> (ø)
smartsim/_core/launcher/dragon/dragonLauncher.py 25.78% <22.22%> (ø)
smartsim/_core/utils/helpers.py 32.71% <27.27%> (ø)
smartsim/experiment.py 38.67% <33.33%> (ø)
smartsim/settings/dispatch.py 64.06% <38.88%> (ø)
smartsim/_core/control/launch_history.py 43.47% <43.47%> (ø)

@MattToast MattToast merged commit 77eaf4d into CrayLabs:smartsim-refactor Aug 9, 2024
22 of 35 checks passed
@MattToast MattToast deleted the simple-status branch August 9, 2024 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: api Issues related to API changes area: launcher Issues related to any of the launchers within SmartSim ignore-for-release type: feature Issues that include feature request or feature idea
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants