-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Deprecate direct support for truncated backprop through time #8732
Comments
If the alternative is manual optimization, they would lose the functionality of fault-tolerance at the split level (if and when we have it). This is because manual optimization users would need to handle the progress tracking updates themselves. If this was instead extracted into a separate optional loop then that would work. |
from @awaelchli
+1, that sounds like a reasonable next step |
@anhnht3 since you thumbs-downed the RFC, could you describe how you're relying on tbptt in Lightning today? |
As a user, I often need to employ tbptt for RNN models. Let's say tbptt is going to be deprecated, I am not sure it's easy to implement it on my own without understanding Lightning's internals. |
Could you share some example code for how you're currently training? |
I’m also curious here why we want to deprecate tbptt? this would make PL useless for sequencing work. in fact, it would also make it impossible. and MAYBE (big maybe), it might be possible IF you understand PL at the depth of a core contributor. but that’s an unrealistic ask for 99.9% of users. this also goes against the “no need to learn a new framework” core value we have. |
@williamFalcon thanks for taking a look. The broader theme and question I want to answer is how we can support more flavors of training steps in a consistent manner. For flavors of training steps we have:
Which affect the contract between the trainer and the lightning module because these result in slightly different loop structures. Here are some examples:
And incoming requests:
What I'd like to accomplish is a way we can overall widen Lightning's scope by supporting these various flavors. We can make clear that tbptt is one possible extension of the core Lightning, as well as lay a consistent pattern for how more extensions can be added in a way that doesn't bloat the core interfaces. Otherwise users looking at I think it's perfectly reasonable for us to be able to support these as extension loops inside of the Trainer in a way that abstracts this logic from the users, especially since these are quite targeted and aren't completely at odds with the rest of the loop structure. I have some preliminary ideas here, but I need to do more exploration work on this: https://docs.google.com/document/d/1xHU7-iQSpp9KJTjI3As2EM0mfNHHr37WZYpDpwLkivA/edit#heading=h.7ie80o6nh4et |
I've been thinking about this as well. Currently, the logic for handling tbptt is on the critical path for all use cases even if they don't need it. In fact, (IIUC) it is also the sole reason that However, we may be able to achieve the same architectural simplification by further decomposing Currently, all use cases go through the same loop structure: for tbptt + multiple optimizer + automatic optimizer, the loop structure can be for non-tbptt + multiple optimizer + automatic optimizer: for non-tbptt + single optimizer + automatic optimizer: for non-tbptt + single optimizer + manual optimizer: On a side note, I feel it's worthwhile introducing an internal abstraction for optimization flows (for encapsulating the logic that actually calls |
@yifuwang I've also thought about this. I have a WIP for extracting the optimizer loop for automatic optimization. |
Right. It's probably unavoidable to have some not-as-elegant code for interpreting the LightningModule/Trainer configuration and translating it into the loop structure. But IMHO having them contained is a good first step (as opposed to checking
Curious what's your thoughts on this ^ |
Generally yes I agree and I like it. Would need more details as to what responsibilities this abstraction has. Perhaps the optimization flow abstraction you suggest could handle the creation of this closure as well. |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
Related to this issue - I encountered the fact that we have the split index of BPTT on the progress bar, and found this oddly specific to be on the progress bar. |
With new models like:
which require some kind of back-propagation through time algorithm, this would have been very useful for state-of-the-art transformer models... |
🚀 Feature
Deprecate direct framework support for truncated backprop through time
Motivation
We are auditing the Lightning components and APIs to assess opportunities for improvements:
Lightning today offers specific support for truncated backpropagation through time. This manifests in:
However, this is highly specific for a particular use case. Moreover, with manual optimization, users have this flexibility to more finely control the batch splitting, loss, and optimizer step behavior. Put another way, users can already train with tbptt techniques without using this property at all. This points out that we no longer need this direct support on the core interface.
@PyTorchLightning/core-contributors
Pitch
Benefits:
Approach:
Alternatives
Keep as is
Additional context
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning
Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @Borda @tchaton @justusschock @awaelchli @rohitgr7 @akihironitta
The text was updated successfully, but these errors were encountered: