Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate and remove on_epoch_start/end and on_batch_start/end hooks #10807

Closed
rohitgr7 opened this issue Nov 29, 2021 · 7 comments
Closed

Deprecate and remove on_epoch_start/end and on_batch_start/end hooks #10807

rohitgr7 opened this issue Nov 29, 2021 · 7 comments
Labels
deprecation Includes a deprecation hooks Related to the hooks API
Milestone

Comments

@rohitgr7
Copy link
Contributor

rohitgr7 commented Nov 29, 2021

Proposed refactor

I propose to deprecate and remove on_epoch_start/end and on_batch_start/end hooks.

Motivation

  • on_epoch_start/on_epoch_end: These 2 hooks runs within each mode train/val/test currently. We already have on_train_epoch_start/on_val_epoch_start/on_test_epoch_start. The reason why we kept them to have a common hook that is called in each mode is so that users can configure operations required to be done within each mode. But I think this is more of a specific application/use-case and can be configured easily by the user without this hook. Also, it can be confusing when referring to val/test because epoch doesn't mean anything during evaluation. Also, there are other mode-specific hooks (actually many) that don't have a special version of a hook that runs for all modes. For eg. we don't have any separate on_start/on_end hooks or on_dataloader hook that run along with on_{train/val/test}_start/end so why special treatment here?

  • on_batch_start/on_batch_end: They run along with on_train_batch_start/on_train_batch_end and don't provide any other significance so we should remove them as well. We can make them run within each mode but then again the same points from above, do we even need them?

History:
all these hooks seem to be added at the inception, the best I could find is this PR which is very old.
the behavior that enabled on_epoch_start/end was I think discussed/approved over slack and I added them 😅 a while back: #6498

Pitch

Simply deprecate and remove.

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @tchaton @carmocca @awaelchli @Borda @ninginthecloud @justusschock @akihironitta

@rohitgr7 rohitgr7 added refactor hooks Related to the hooks API labels Nov 29, 2021
@rohitgr7 rohitgr7 added this to the 1.6 milestone Nov 29, 2021
@carmocca carmocca added deprecation Includes a deprecation and removed refactor labels Nov 29, 2021
@carmocca
Copy link
Contributor

Can you explore the depths of our git history and link here the PRs/discussion that added them initially?

@rohitgr7
Copy link
Contributor Author

@carmocca updated the description with details.

@awaelchli
Copy link
Contributor

There was also the PR from @williamFalcon that added all the on_train_batch_start etc. hooks (can't find it). This was the beginning of the overlap of these hooks.
Note that the on_batch_start/end initially were only for training and they ONLY ran for training. Later we made the decision to make them run generally in all modes. I am not a fan of constantly changing our mind and reverting decisions.

An additional argument that can be added to your issue is that there are other mode-specific hooks (actually many) that don't have a special version of a hook that runs for all modes (example: x_dataloader()).

@rohitgr7
Copy link
Contributor Author

rohitgr7 commented Nov 30, 2021

Note that the on_batch_start/end initially were only for training and they ONLY ran for training. Later we made the decision to make them run generally in all modes.

this still runs only in the training mode.

An additional argument that can be added to your issue is that there are other mode-specific hooks (actually many) that don't have a special version of a hook that runs for all modes (example: x_dataloader()).

yep I have mentioned one example.. updated it.

@carmocca
Copy link
Contributor

Later we made the decision to make them run generally in all modes

I also remember this but the problem is nobody actually went through and implemented them for all modes.

@rohitgr7
Copy link
Contributor Author

yeah, but still don't think this is necessary.

one can simply do:

def _common_batch_start(self, ...):
    ...

def on_train_batch_start(self, ...):
    self._common_batch_start(...)

def on_val_batch_start(self, ...):
    self._common_batch_start(...)

at least this makes sure that user has implemented what they need as per their requirement but if for some reason they miss the doc and implement on_batch_start thinking that it will run only for training, then it can lead to issues. Also if they log in this hook then we will have to add some extra checks to detect the default on_step/on_epoch values within self.log for each mode.

@ruro
Copy link
Contributor

ruro commented Apr 22, 2022

I am sorry, but in my opinion this was the wrong decision. "The user can just implement it themselves" is a bad argument when your library advertises itself as a tool for reducing boilerplate. I know, that it's just 9 lines, but in my opinion, it's the thought that counts.

With pytorch-lightning I often find myself writing code that looks like this

    def some_method(self, ...):
         return self.some_other_method(...)

And if this isn't boilerplate, then I don't know what is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation Includes a deprecation hooks Related to the hooks API
Projects
None yet
Development

No branches or pull requests

4 participants