You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A bunch of ML workflows and training systems take huge amounts of hyper parameters when configuring training jobs. Sometimes on the order of hundreds of separate variables that are needed to define a training run.
Today there does not seem to be a way to provide a file based input for workflow execution. This causes users to either pass a URL as input towards a hosted datafile somewhere, or to manually break out each variable in their configuration and provide that in either the CLI or via the web UI - this is not realistic or tenable for modern ML training projects.
Goal: What should the final outcome look like, ideally?
With pyflyte run, you should be able to provide a structured yaml or json input defining the configuration of your workflow.
Ideally all of those configuration settings would be matched against a user defined hyperparameter class, rather than piecemeal parameters passed in like a conventional python function.
A user should also be able to register their workflow, and be able to provide the yaml file to execute via the UI, and also via a launch plan.
Describe alternatives you've considered
We do support setting some parameters as fixed in launch plans, however that takes control and agency out of the users hands around what actual elements are fixed and variable; and forcing users to generate potentially hundreds of separate launch plans for each individual training event.
Even when setting up Launch Plans, the user must manually hardcode all of the variables that are static between executions, which defeats a lot of the purpose here.
Motivation: Why do you think this is important?
A bunch of ML workflows and training systems take huge amounts of hyper parameters when configuring training jobs. Sometimes on the order of hundreds of separate variables that are needed to define a training run.
Today there does not seem to be a way to provide a file based input for workflow execution. This causes users to either pass a URL as input towards a hosted datafile somewhere, or to manually break out each variable in their configuration and provide that in either the CLI or via the web UI - this is not realistic or tenable for modern ML training projects.
Goal: What should the final outcome look like, ideally?
With pyflyte run, you should be able to provide a structured yaml or json input defining the configuration of your workflow.
Ideally all of those configuration settings would be matched against a user defined hyperparameter class, rather than piecemeal parameters passed in like a conventional python function.
A user should also be able to register their workflow, and be able to provide the yaml file to execute via the UI, and also via a launch plan.
Describe alternatives you've considered
We do support setting some parameters as fixed in launch plans, however that takes control and agency out of the users hands around what actual elements are fixed and variable; and forcing users to generate potentially hundreds of separate launch plans for each individual training event.
Even when setting up Launch Plans, the user must manually hardcode all of the variables that are static between executions, which defeats a lot of the purpose here.
Propose: Link/Inline OR Additional context
https://pypi.org/project/yamldataclassconfig/
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: