Connectors Refactoring Discussion #4510

justusschock · 2020-11-04T09:22:04Z

This is a brief summary of our refactoring discussion with @tchaton @awaelchli .

The easy part

Refactor all connectors to be completely self-contained and establish call hierarchy.
This means:

Connectors should not call other connectors
Connectors should not have a trainer reference or change anything on trainer
Trainer properties will just gather information from respective connector.

A potential call hierarchy could be like this:
Trainer -> Loops -> Connectors -> Accelerators -> Plugins

That also means, that every class can only call types that are below it in the hierarchy and nothing that is on the same level or higher. I.e. Accelerators cannot call other accelerators, trainer, loops or connectors but only plugins.

The more difficult part

Refactor all Accelerators and Plugins to further separate them.
Accelerators should only contain hardware specific stuff, whereas plugins include changes in the training routine.

So we would end up with something about 3-5 accelerators and everything else (like DDP, DDP2, DDP-Spawn, DP, AMP etc.) would become a Plugin. This has the advantages that
1.) it is cleaner to implement once the things are more separated
2.) It is easier to look at specific parts since currently the DDP stuff for example is still scattered across the plugin and the accelerators
3.) Every new plugin should apply to all/most accelerators with no change.

How to do this

The first part can be changed slowly while fixing bugs and implementing features.

The second part however needs to be done more carefully since this requires breaking changes in our internals.
So I'll start there with a draft and see, how hard it would be and how the result would look like before we apply this to all the accelerators and plugins.

cc @edenlightning

tchaton · 2020-11-05T08:21:17Z

Hey @justusschock,

Great summary!
Should plugins have access to the trainer ?

Best regards,
T.C

tarepan · 2020-11-06T17:17:33Z

Separation of Concern with controlled dependency is great!
Can we check above idea with a concrete case?

For example, CheckpointConnector change trainer state.
It break the rule Connectors should not have a trainer reference or change anything on trainer,
but changing trainer state is concern of this connector.
How we should refactor?

justusschock · 2020-11-09T09:47:14Z

@tchaton Ideally they wouldn't.

@tarepan This is not yet sure. But I'm currently trying to prototype this.

edenlightning · 2020-11-18T20:21:37Z

@justusschock any updates?

justusschock · 2020-11-19T11:21:02Z

@edenafek I have refactored some of the accelerators and plugins as proof of concept. currently working on trainer integration.

edenlightning · 2020-11-19T17:10:56Z

link to draft PR?

justusschock · 2020-11-23T08:13:48Z

There is no so far. I'm doing this in my own fork and I'll open a draft PR, as soon as I got a running version.

awaelchli · 2020-12-28T15:33:41Z

Quick update since there was some interest: current state is here in accelerator_refactor branch in Justus's fork.
https://github.com/justusschock/pytorch-lightning/tree/accelerator-refactor

Slides:
https://docs.google.com/presentation/d/1v5EOO9yl3rxQ_uXGPPc4VURNOvx4odqPCiP5eRRPEy0/edit?usp=sharing

Borda · 2021-01-08T00:37:38Z

Slides: https://docs.google.com/presentation/d/1v5EOO9yl3rxQ_uXGPPc4VURNOvx4odqPCiP5eRRPEy0/edit?usp=sharing

mind open it for comments? :]

justusschock added feature Is an improvement or enhancement design Includes a design discussion labels Nov 4, 2020

justusschock self-assigned this Nov 4, 2020

tchaton added this to the 1.1 milestone Nov 5, 2020

Borda modified the milestones: 1.1, 1.2 Nov 30, 2020

awaelchli mentioned this issue Jan 4, 2021

deprecate enable_pl_optimizer as it is not restored properly #5244

Merged

11 tasks

justusschock mentioned this issue Jan 6, 2021

WIP: Accelerator refactor #5385

Closed

9 tasks

awaelchli mentioned this issue Jan 8, 2021

Refactor setup_training and remove test_mode #5388

Merged

12 tasks

Borda added discussion In a discussion stage Important labels Jan 8, 2021

edenlightning changed the title ~~Refactoring Discussion~~ Connectors Refactoring Discussion Jan 8, 2021

justusschock mentioned this issue Jan 22, 2021

PoC: Accelerator refactor [wip] [skip ci] #5616

Closed

16 tasks

awaelchli mentioned this issue Jan 23, 2021

fix error when logging to progress bar with reserved name #5620

Merged

justusschock mentioned this issue Feb 2, 2021

PoC: Accelerator refactor #5743

Merged

16 tasks

tchaton closed this as completed in #5743 Feb 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connectors Refactoring Discussion #4510

Connectors Refactoring Discussion #4510

justusschock commented Nov 4, 2020

tchaton commented Nov 5, 2020

tarepan commented Nov 6, 2020

justusschock commented Nov 9, 2020

edenlightning commented Nov 18, 2020

justusschock commented Nov 19, 2020

edenlightning commented Nov 19, 2020

justusschock commented Nov 23, 2020

awaelchli commented Dec 28, 2020 •

edited

Loading

Borda commented Jan 8, 2021 •

edited

Loading

Connectors Refactoring Discussion #4510

Connectors Refactoring Discussion #4510

Comments

justusschock commented Nov 4, 2020

The easy part

The more difficult part

How to do this

tchaton commented Nov 5, 2020

tarepan commented Nov 6, 2020

justusschock commented Nov 9, 2020

edenlightning commented Nov 18, 2020

justusschock commented Nov 19, 2020

edenlightning commented Nov 19, 2020

justusschock commented Nov 23, 2020

awaelchli commented Dec 28, 2020 • edited Loading

Borda commented Jan 8, 2021 • edited Loading

awaelchli commented Dec 28, 2020 •

edited

Loading

Borda commented Jan 8, 2021 •

edited

Loading