-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utilities to set ML models and scripts from driver scripts #185
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #185 +/- ##
===========================================
- Coverage 81.80% 78.57% -3.24%
===========================================
Files 57 59 +2
Lines 2974 3743 +769
===========================================
+ Hits 2433 2941 +508
- Misses 541 802 +261
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! couple comments and 1 bigger design thought.
comments
- whats the workflow for
Ensemble
? is the user required to write that loop themselves to call theModel
functions? - tests for
Ensembles
?
Should we allow the user to simply pass a DBModel
to Experiment.start()
in addition to the methods in the Model
and Ensemble
classes? DBObjects
would be more user facing, but
77067f5
to
21bc88a
Compare
@Spartee re: passing the model to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very close, couple comments we should chat about today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving with minor comments. Looks great. Works on my mac.
if not ensembles: | ||
return False | ||
|
||
has_db_objects |= any([has_db_models(ensemble) | has_db_scripts(ensemble) for ensemble in ensembles]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comments here to explain what's being done. Love the operator usage but it's not readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, done.
args.device+f":{device_num}") | ||
elif args.file: | ||
if args.devices_per_node == 1: | ||
client.set_script_from_file(args.name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if these fail? Have we tried setting a bad model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions are launched in a try-catch statement, I think it should catch such an exception. Do you think we should add a test for that case?
Adds utilities to set ML models and ML scripts directly from driver scripts, as opposed to only from application code. [ committed by @al-rigazzi ] [ reviewed by @Spartee ]
This PR adds the functionalities needed to add ML models and TorchScript functions to the orchestrators (convergend and not, but see caveat below) from the driver script launching SmartSim.
The full set of available new functionalities:
Basically, what can not be done is setting a Model or a function from memory on converged orchestrators. These are the reasons for such API gaps:
I decided to use the SmartRedis client to connect to orchestrators and set models and scripts, as spawning
redis-cli
processes was more convoluted, in my opinion.