Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Deprecate direct support for truncated backprop through time #8732

Closed
ananthsub opened this issue Aug 5, 2021 · 19 comments
Closed

[RFC] Deprecate direct support for truncated backprop through time #8732

ananthsub opened this issue Aug 5, 2021 · 19 comments
Labels
deprecation Includes a deprecation design Includes a design discussion discussion In a discussion stage
Milestone

Comments

@ananthsub
Copy link
Contributor

ananthsub commented Aug 5, 2021

🚀 Feature

Deprecate direct framework support for truncated backprop through time

Motivation

We are auditing the Lightning components and APIs to assess opportunities for improvements:

Lightning today offers specific support for truncated backpropagation through time. This manifests in:

However, this is highly specific for a particular use case. Moreover, with manual optimization, users have this flexibility to more finely control the batch splitting, loss, and optimizer step behavior. Put another way, users can already train with tbptt techniques without using this property at all. This points out that we no longer need this direct support on the core interface.

@PyTorchLightning/core-contributors

Pitch

Benefits:

  • Philosophically, we recommend users to use a more generic interface through which they can implement this logic. Importantly, this sets a precedent the framework is not and should not be on the critical path for every type of training technique. There are an endless supply, and the framework can easily buckle trying to directly support all of these. This is also vital because training techniques come and go, and the framework can quickly accrue huge debt trying to support features without clear value to users. For example, BPTT was critical with LSTMs, but if more users are shifting to transformers, this tool becomes less essential for users.
  • Tactically, we can simplify the framework & interfaces offered by deprecating these. this means less maintenance and risk of bugs

Approach:

  • Mark these hooks as deprecated in v1.5, and remove support in v1.7
  • In tutorials, we can demonstrate how to train these models with manual optimization to demonstrate that this is not a strict loss of functionality
  • We can still offer batch-splitting as a utility (outside of the LightningModule) if users find this helpful

Alternatives

Keep as is

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

  • Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

  • Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @Borda @tchaton @justusschock @awaelchli @rohitgr7 @akihironitta

@ananthsub ananthsub added the design Includes a design discussion label Aug 5, 2021
@carmocca
Copy link
Contributor

carmocca commented Aug 5, 2021

In tutorials, we can demonstrate how to train these models with manual optimization to demonstrate that this is not a strict loss of functionality

If the alternative is manual optimization, they would lose the functionality of fault-tolerance at the split level (if and when we have it). This is because manual optimization users would need to handle the progress tracking updates themselves.

If this was instead extracted into a separate optional loop then that would work.

@ananthsub
Copy link
Contributor Author

from @awaelchli

Regardless, I believe it would be interesting to see a lightning module rnn manual-implementation without the built-in TBTT. If we have that, we can examine the amount of boilerplate required and get a better picture of what impact this deprecation has. Perhaps I could help here. I could try to contribute an example here.

+1, that sounds like a reasonable next step

@ananthsub
Copy link
Contributor Author

@anhnht3 since you thumbs-downed the RFC, could you describe how you're relying on tbptt in Lightning today?

@anhnht3
Copy link

anhnht3 commented Aug 7, 2021

As a user, I often need to employ tbptt for RNN models. Let's say tbptt is going to be deprecated, I am not sure it's easy to implement it on my own without understanding Lightning's internals.

@ananthsub
Copy link
Contributor Author

ananthsub commented Aug 7, 2021

As a user, I often need to employ tbptt for RNN models. Let's say tbptt is going to be deprecated, I am not sure it's easy to implement it on my own without understanding Lightning's internals.

Could you share some example code for how you're currently training?

@williamFalcon
Copy link
Contributor

williamFalcon commented Aug 7, 2021

I’m also curious here why we want to deprecate tbptt? this would make PL useless for sequencing work. in fact, it would also make it impossible.

and MAYBE (big maybe), it might be possible IF you understand PL at the depth of a core contributor. but that’s an unrealistic ask for 99.9% of users.

this also goes against the “no need to learn a new framework” core value we have.

@ananthsub
Copy link
Contributor Author

ananthsub commented Aug 7, 2021

@williamFalcon thanks for taking a look. The broader theme and question I want to answer is how we can support more flavors of training steps in a consistent manner. For flavors of training steps we have:

  • different input arguments
  • different output return value expectations

Which affect the contract between the trainer and the lightning module because these result in slightly different loop structures.

Here are some examples:

  • using a single optimizer vs multiple optimizers
    -- by default we loop over optimizers 1 by 1 and call the training step once per optimizer
  • tbptt
    -- we call the training step once per split batch
  • manual optimization
    -- we call the training step once and users are expected to handle the bwd, optimizer step, and extensions like grad accumulation & mixed precision

And incoming requests:

What I'd like to accomplish is a way we can overall widen Lightning's scope by supporting these various flavors. We can make clear that tbptt is one possible extension of the core Lightning, as well as lay a consistent pattern for how more extensions can be added in a way that doesn't bloat the core interfaces. Otherwise users looking at core/lightning.py can quickly be overwhelmed in terms of trying to figure out what is supported vs what they need to implement.

I think it's perfectly reasonable for us to be able to support these as extension loops inside of the Trainer in a way that abstracts this logic from the users, especially since these are quite targeted and aren't completely at odds with the rest of the loop structure.

I have some preliminary ideas here, but I need to do more exploration work on this: https://docs.google.com/document/d/1xHU7-iQSpp9KJTjI3As2EM0mfNHHr37WZYpDpwLkivA/edit#heading=h.7ie80o6nh4et

@yifuwang
Copy link
Contributor

yifuwang commented Aug 9, 2021

Tactically, we can simplify the framework & interfaces offered by deprecating these. this means less maintenance and risk of bugs

I've been thinking about this as well. Currently, the logic for handling tbptt is on the critical path for all use cases even if they don't need it. In fact, (IIUC) it is also the sole reason that TrainingBatchLoop is a Loop (the support for multiple optimizers is not organized as a Loop). I definitely agree with you that there's opportunity for simplifying the framework's internal architecture.

However, we may be able to achieve the same architectural simplification by further decomposing TrainingBatchLoop, without dropping support for tbptt.

Currently, all use cases go through the same loop structure: FitLoop -> TrainingEpochLoop -> TrainingBatchLoop. The first two have rather clear responsibilities, while TrainingBatchLoop handles the combination of tbptt/non-tbptt, single optimizer/multiple optimizers, and automatic optimization/manual optimization. I feel the framework can becomes much simpler, more composable, and more flexible if the loop structure is selected based on the requested features. For example,

for tbptt + multiple optimizer + automatic optimizer, the loop structure can be
FitLoop -> TrainingEpochLoop -> TbpttLoop -> OptimizerLoop -> AutomaticOptimizationFlow

for non-tbptt + multiple optimizer + automatic optimizer:
FitLoop -> TrainingEpochLoop -> OptimizerLoop -> AutomaticOptimizationFlow

for non-tbptt + single optimizer + automatic optimizer:
FitLoop -> TrainingEpochLoop -> AutomaticOptimizationFlow

for non-tbptt + single optimizer + manual optimizer:
FitLoop -> TrainingEpochLoop -> ManualOptimizationFlow

On a side note, I feel it's worthwhile introducing an internal abstraction for optimization flows (for encapsulating the logic that actually calls training_step) and always attached it to the end of the loop structure (currently the inner most Loop invokes training_step). IMHO this would improve the maintainability of different optimization flows because (1) we envision supporting more of them (2) the complexity of different flows vary greatly (e.g. the flow for automatic optimization is much more complicated than that of manual optimization).

@awaelchli
Copy link
Contributor

@yifuwang I've also thought about this. I have a WIP for extracting the optimizer loop for automatic optimization.
For selecting the structure of loop, it remains a question how to best do it since it depends on the properties of Trainer and LightningModule, so Lightning needs to build the loop at the time of calling fit, test, etc. when the model becomes available.

@yifuwang
Copy link
Contributor

yifuwang commented Aug 9, 2021

@awaelchli

it remains a question how to best do it since it depends on the properties of Trainer and LightningModule

Right. It's probably unavoidable to have some not-as-elegant code for interpreting the LightningModule/Trainer configuration and translating it into the loop structure. But IMHO having them contained is a good first step (as opposed to checking automatic_optimization all over the place).

On a side note, I feel it's worthwhile introducing an internal abstraction for optimization flows (for encapsulating the logic that actually calls training_step) and always attached it to the end of the loop structure (currently the inner most Loop invokes training_step). IMHO this would improve the maintainability of different optimization flows because (1) we envision supporting more of them (2) the complexity of different flows vary greatly (e.g. the flow for automatic optimization is much more complicated than that of manual optimization).

Curious what's your thoughts on this ^

@awaelchli
Copy link
Contributor

awaelchli commented Aug 9, 2021

Curious what's your thoughts on this ^

Generally yes I agree and I like it. Would need more details as to what responsibilities this abstraction has.
I have been working slightly in this direction with a abstraction for our closure handling internally with a LightningClosure here: #8642 This can simplify our logic in TrainingBatchLoop for handling the code paths for manual vs. auto optimization.

Perhaps the optimization flow abstraction you suggest could handle the creation of this closure as well.

@stale
Copy link

stale bot commented Sep 19, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Sep 19, 2021
@awaelchli awaelchli removed the won't fix This will not be worked on label Sep 20, 2021
@stale
Copy link

stale bot commented Oct 21, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Oct 21, 2021
@awaelchli awaelchli removed the won't fix This will not be worked on label Oct 21, 2021
@tchaton
Copy link
Contributor

tchaton commented Oct 23, 2021

It seems the community didn't answer. Let's try in another time and measure engagement again.

Screenshot 2021-10-23 at 16 39 47

@stale
Copy link

stale bot commented Nov 22, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Nov 22, 2021
@awaelchli awaelchli removed the won't fix This will not be worked on label Nov 22, 2021
@stale
Copy link

stale bot commented Dec 27, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Dec 27, 2021
@awaelchli awaelchli removed the won't fix This will not be worked on label Dec 30, 2021
@stale
Copy link

stale bot commented Feb 6, 2022

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Feb 6, 2022
@awaelchli awaelchli added this to the future milestone Feb 7, 2022
@carmocca carmocca removed the won't fix This will not be worked on label Feb 7, 2022
@carmocca carmocca moved this to Waiting in Lightning RFCs Feb 28, 2022
@daniellepintz
Copy link
Contributor

Related to this issue - I encountered the fact that we have the split index of BPTT on the progress bar, and found this oddly specific to be on the progress bar.

https://github.com/PyTorchLightning/pytorch-lightning/blob/5da065e287b44e2c1fe4f7951003813ed45365c9/pytorch_lightning/callbacks/progress/base.py#L214-L215

@pfeatherstone
Copy link

With new models like:

which require some kind of back-propagation through time algorithm, this would have been very useful for state-of-the-art transformer models...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation Includes a deprecation design Includes a design discussion discussion In a discussion stage
Projects
No open projects
Status: Waiting
Development

No branches or pull requests

9 participants