Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Pass worker_spec_command to mpi plugin to support horovod #341

Merged
merged 4 commits into from
Apr 19, 2023

Conversation

ByronHsu
Copy link
Contributor

@ByronHsu ByronHsu commented Apr 3, 2023

TL;DR

Pass worker_spec_command to mpi plugin to support horovod task

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

In this pr in flytekit, we pass worker_spec_command to backend to be run on worker pod.

Tracking Issue

flyteorg/flyte#3567

@codecov
Copy link

codecov bot commented Apr 3, 2023

Codecov Report

Merging #341 (3c7a135) into master (38e8a07) will increase coverage by 1.33%.
The diff coverage is 65.45%.

❗ Current head 3c7a135 differs from pull request most recent head 17f241e. Consider uploading reports for the commit 17f241e to get more accurate results

@@            Coverage Diff             @@
##           master     #341      +/-   ##
==========================================
+ Coverage   62.66%   63.99%   +1.33%     
==========================================
  Files         146      146              
  Lines       12226     9927    -2299     
==========================================
- Hits         7661     6353    -1308     
+ Misses       3983     2988     -995     
- Partials      582      586       +4     
Flag Coverage Δ
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
go/tasks/pluginmachinery/core/transition.go 100.00% <ø> (ø)
go/tasks/pluginmachinery/internal/webapi/core.go 25.00% <0.00%> (+2.27%) ⬆️
go/tasks/plugins/hive/executor.go 10.12% <0.00%> (+1.87%) ⬆️
.../plugins/k8s/kfoperators/common/common_operator.go 80.20% <0.00%> (+8.90%) ⬆️
go/tasks/plugins/presto/executor.go 11.26% <0.00%> (+1.96%) ⬆️
...tasks/pluginmachinery/flytek8s/container_helper.go 87.41% <40.00%> (+0.23%) ⬆️
go/tasks/pluginmachinery/flytek8s/pod_helper.go 69.81% <50.00%> (-3.21%) ⬇️
go/tasks/plugins/array/catalog.go 46.98% <68.18%> (+1.93%) ⬆️
go/tasks/plugins/k8s/kfoperators/mpi/mpi.go 73.95% <71.42%> (-1.05%) ⬇️
go/tasks/plugins/array/k8s/executor.go 40.67% <100.00%> (+1.09%) ⬆️
... and 3 more

... and 118 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Comment on lines +81 to 85
workersPodSpec.Containers[k].Args = workerSpecCommand
workersPodSpec.Containers[k].Command = []string{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use workerSpecCommand for container's command? If not, I think we could rename it to workerSpecArgs

Suggested change
workersPodSpec.Containers[k].Args = workerSpecCommand
workersPodSpec.Containers[k].Command = []string{}
workersPodSpec.Containers[k].Args = []string{}
workersPodSpec.Containers[k].Command = workerSpecCommand

Copy link
Contributor Author

@ByronHsu ByronHsu Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the same pattern as _get_container of PythonAutoContainer, where we put get_command in args of the container.

I think on the user-facing side (flytekit), we can call them commands, but on the backend side (flytepropeller), we can put the command in args.

Putting either in command or args of the container both work I guess. I just want to make it consistent with other cases

Copy link
Contributor

@hamersaw hamersaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ByronHsu I'm not completely following this. Hoping you can clarify faster than I can look through the code. In the current implementation there is a comment WorkerPodSpec doesn't need any Argument & command. It will be trigger from launcher pod implying that the kubeflow operator (or launcher Pod) will override the command set in the worker PodSpec. Is that not actually the case? In this PR it seems we're setting the command which means it is maintained through the launcher. Does the kf operator use the command if it's provided and set it if it's not? Perhaps this should be documented here?

@hamersaw
Copy link
Contributor

Also, need DCO signoff, thanks!

byhsu added 2 commits April 17, 2023 12:36
wip
Signed-off-by: byhsu <[email protected]>
Signed-off-by: byhsu <[email protected]>
@ByronHsu
Copy link
Contributor Author

ByronHsu commented Apr 17, 2023

@hamersaw no. in some cases, the mpi jobs need worker commands. In the case of horovod tasks, each worker needs to run a ssh daemon. (#341)

Signed-off-by: byhsu <[email protected]>
Signed-off-by: byhsu <[email protected]>
@hamersaw hamersaw merged commit 46796b6 into flyteorg:master Apr 19, 2023
eapolinario pushed a commit that referenced this pull request Sep 6, 2023
* wip

Signed-off-by: byhsu <[email protected]>

* add comment

Signed-off-by: byhsu <[email protected]>

* more comment

Signed-off-by: byhsu <[email protected]>

* fix style

Signed-off-by: byhsu <[email protected]>

---------

Signed-off-by: byhsu <[email protected]>
Co-authored-by: byhsu <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants