-
Notifications
You must be signed in to change notification settings - Fork 53
Pass worker_spec_command
to mpi plugin to support horovod
#341
Conversation
Codecov Report
@@ Coverage Diff @@
## master #341 +/- ##
==========================================
+ Coverage 62.66% 63.99% +1.33%
==========================================
Files 146 146
Lines 12226 9927 -2299
==========================================
- Hits 7661 6353 -1308
+ Misses 3983 2988 -995
- Partials 582 586 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 118 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
workersPodSpec.Containers[k].Args = workerSpecCommand | ||
workersPodSpec.Containers[k].Command = []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use workerSpecCommand
for container's command? If not, I think we could rename it to workerSpecArgs
workersPodSpec.Containers[k].Args = workerSpecCommand | |
workersPodSpec.Containers[k].Command = []string{} | |
workersPodSpec.Containers[k].Args = []string{} | |
workersPodSpec.Containers[k].Command = workerSpecCommand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the same pattern as _get_container
of PythonAutoContainer
, where we put get_command in args
of the container.
I think on the user-facing side (flytekit), we can call them commands, but on the backend side (flytepropeller), we can put the command in args.
Putting either in command or args of the container both work I guess. I just want to make it consistent with other cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ByronHsu I'm not completely following this. Hoping you can clarify faster than I can look through the code. In the current implementation there is a comment WorkerPodSpec doesn't need any Argument & command. It will be trigger from launcher pod
implying that the kubeflow operator (or launcher Pod) will override the command set in the worker PodSpec. Is that not actually the case? In this PR it seems we're setting the command which means it is maintained through the launcher. Does the kf operator use the command if it's provided and set it if it's not? Perhaps this should be documented here?
Also, need DCO signoff, thanks! |
Signed-off-by: byhsu <[email protected]>
Signed-off-by: byhsu <[email protected]>
535e0c6
to
b96d699
Compare
Signed-off-by: byhsu <[email protected]>
Signed-off-by: byhsu <[email protected]>
* wip Signed-off-by: byhsu <[email protected]> * add comment Signed-off-by: byhsu <[email protected]> * more comment Signed-off-by: byhsu <[email protected]> * fix style Signed-off-by: byhsu <[email protected]> --------- Signed-off-by: byhsu <[email protected]> Co-authored-by: byhsu <[email protected]>
TL;DR
Pass
worker_spec_command
to mpi plugin to support horovod taskType
Are all requirements met?
Complete description
In this pr in flytekit, we pass
worker_spec_command
to backend to be run on worker pod.Tracking Issue
flyteorg/flyte#3567