-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connectors Refactoring Discussion #4510
Comments
Hey @justusschock, Great summary! Best regards, |
Separation of Concern with controlled dependency is great! For example, CheckpointConnector change trainer state. |
@justusschock any updates? |
@edenafek I have refactored some of the accelerators and plugins as proof of concept. currently working on trainer integration. |
link to draft PR? |
There is no so far. I'm doing this in my own fork and I'll open a draft PR, as soon as I got a running version. |
Quick update since there was some interest: current state is here in Slides: |
mind open it for comments? :] |
This is a brief summary of our refactoring discussion with @tchaton @awaelchli .
The easy part
Refactor all connectors to be completely self-contained and establish call hierarchy.
This means:
A potential call hierarchy could be like this:
Trainer -> Loops -> Connectors -> Accelerators -> Plugins
That also means, that every class can only call types that are below it in the hierarchy and nothing that is on the same level or higher. I.e. Accelerators cannot call other accelerators, trainer, loops or connectors but only plugins.
The more difficult part
Refactor all Accelerators and Plugins to further separate them.
Accelerators should only contain hardware specific stuff, whereas plugins include changes in the training routine.
So we would end up with something about 3-5 accelerators and everything else (like DDP, DDP2, DDP-Spawn, DP, AMP etc.) would become a Plugin. This has the advantages that
1.) it is cleaner to implement once the things are more separated
2.) It is easier to look at specific parts since currently the DDP stuff for example is still scattered across the plugin and the accelerators
3.) Every new plugin should apply to all/most accelerators with no change.
How to do this
The first part can be changed slowly while fixing bugs and implementing features.
The second part however needs to be done more carefully since this requires breaking changes in our internals.
So I'll start there with a draft and see, how hard it would be and how the result would look like before we apply this to all the accelerators and plugins.
cc @edenlightning
The text was updated successfully, but these errors were encountered: