Support for Scheduler taking a value to the step function #2849

Alicegaz · 2020-08-06T11:34:22Z

Looking to the code in training_loop.py it seems like the only scheduler that can take values to the step function is ReduceLROnPlateau, however, there is CosineAnnealingWarmRestarts scheduler and custom schedulers that can take epoch/step id or any other value to the step function.

https://github.com/PyTorchLightning/pytorch-lightning/blob/9ab071588bbe0e24441ae0f07a271bded0e4d9c3/pytorch_lightning/trainer/training_loop.py#L1148

github-actions · 2020-08-06T11:35:06Z

Hi! thanks for your contribution!, great first issue!

awaelchli · 2020-08-09T15:44:07Z

@ananyahjha93 and I discussed this and we think that this change is not necessary, as it does not align with the PyTorch interface of the _LRScheduler base class. Passing in the epoch to step is already deprecated
and the _LRScheduler does not take metrics in step. The only exception is the ReduceOnPlateau and custom ones implemented by users. PL supports the former directly, and for the custom ones I recommend that the user simply overrides the optimizer_step hook in LightningModule and pass the metric into the scheduler there.

awaelchli · 2020-08-09T15:45:25Z

@PyTorchLightning/core-contributors

YannDubs · 2020-08-09T16:44:21Z

I think it would be a mistake / lack of generality not to let the user decide when a metric should be given to the scheduler. You only need to change one line of code to let the use decide for themselves:

Currently: https://github.com/PyTorchLightning/pytorch-lightning/blob/071e09fe38d6170d8b019101a3c1871aeb1b2e99/pytorch_lightning/trainer/optimizers.py#L97

def configure_schedulers(self, schedulers: list):
        # Convert each scheduler into dict structure with relevant information
        lr_schedulers = []
        default_config = {'interval': 'epoch',  # default every epoch
                          'frequency': 1,  # default every epoch/batch
                          'reduce_on_plateau': False,  # most often not ReduceLROnPlateau scheduler
                          'monitor': 'val_loss'}  # default value to monitor for ReduceLROnPlateau
        for scheduler in schedulers:
            if isinstance(scheduler, dict):
                if 'scheduler' not in scheduler:
                    raise ValueError('Lr scheduler should have key `scheduler`',
                                     ' with item being a lr scheduler')
                scheduler['reduce_on_plateau'] = isinstance(
                    scheduler['scheduler'], optim.lr_scheduler.ReduceLROnPlateau)

One line change to let the user decide (don't override "reduce_on_plateau" so that the user can specify it to ensure that a metric is given to the step()

[...]
            if isinstance(scheduler, dict):
                if 'scheduler' not in scheduler:
                    raise ValueError('...')
                if "reduce_on_plateau" not in scheduler:
                    scheduler['reduce_on_plateau'] = isinstance(
                        scheduler['scheduler'], optim.lr_scheduler.ReduceLROnPlateau)

Of course it would then make sense to rename 'reduce_on_plateau' -(?)-> "access_metric"

Advantages:

The user would then be able to specify that the metric should be sured from the scheduler dictionnary (just like 'interval': 'epoch')
Everything is already there ('monitor': 'val_loss', dealing with step(metric)), it just enables the user to use it for their usecases
It's more general than thinking that ReduceLR will be the only scheduler that will use use the validation step

ananyahjha93 · 2020-08-09T18:53:04Z

@YannDubs this is actually a good solution. Submit a PR with this? (rename reduce_on_plateau to access_metric) and make changes to the place you mentioned above and to https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/training_loop.py#L1252 where scheduler.step() is called.

Also, add proper documentation for the changes in the PR as they are going to be significant.

awaelchli · 2020-08-09T20:35:47Z

+1, good suggestion

Alicegaz · 2020-08-10T09:45:29Z

@YannDubs, @ananyahjha93, @awaelchli regarding your concerns, the use of 'monitor' value in a scheduler dictionary already carries the meaning of passing value to the step function. So there is no mistake. Why should we use another instance 'access_metric', it will add redundancy, as 'monitor' only referenced inside the check for the 'access_metric' and is not used anywhere else except this clause.

Also, YannDubs solution does not help to resolve the problem discussed in this issue. Unlike suggestion in my PR #2851, with this we still will not be able to do like this:

>>> iters = len(dataloader)
>>> for epoch in range(20):
>>>     for i, sample in enumerate(dataloader):
>>>         inputs, labels = sample['inputs'], sample['labels']
>>>         optimizer.zero_grad()
>>>         outputs = net(inputs)
>>>         loss = criterion(outputs, labels)
>>>         loss.backward()
>>>         optimizer.step()
>>>         scheduler.step(epoch + i / iters)

Check out official PyTorch code for the CosineAnnealingWarmRestarts.
Suggesting your to reopen the PR #2851! The only thing we can do to make this PR more clear is to add a check for registered schedulers to take in only supported types to the step ("ReduceOnPlateau": "val_*", 'CosineAnnealingWithWarmRestarts': ['epoch', 'iter_frac'], ...) as well as raise an Exception if any for the custom schedulers in cases when they take not supported values.

williamFalcon · 2020-08-10T11:40:54Z

@ananyahjha93 let’s look at both solutions for a general one.

I’d prefer to let the user have the flexibility on this

YannDubs · 2020-08-10T14:45:31Z

@Alicegaz using 'access_metric' allows you to keep a meaningful default for the monitor (i.e. valid_loss), but if the number of keys in the dict is a concern I agree it can be dropped.

For your second point I agree my method doesn't help (not sure how important it is given that pytorch is removing those arguments though, but it might be!). I was trying to say that even though pytorch are depreciating arguments to the step function, PL should at least enable other optimizers to depend on the (val) metric --- which is easily done with essentially no difference in the code.

ananyahjha93 · 2020-08-10T19:15:31Z

@Alicegaz yes, the idea was to use only access_metric and not add it to duplicate monitor, since access_metric is more general than the specific case of ReduceLROnPlateau. But, for the second case, pytorch is deprecating passing epochs to step() function, so I am not too sure that we should keep supporting that. That just requires users to contribute additional logic if they are inheriting _LRScheduler class.

But I would also agree that, not specifically epochs, but custom schedulers should have support for passing arguments to their step function.

stale · 2020-10-22T03:24:27Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

This was referenced Aug 6, 2020

Adding support for the CosineAnnealingWarmRestarts scheduler #2851

Closed

Adding support for the Cosine Annealing Alicegaz/pytorch-lightning#1

Merged

Alicegaz closed this as completed in Alicegaz/pytorch-lightning#1 Aug 6, 2020

Alicegaz reopened this Aug 6, 2020

ananyahjha93 added discussion In a discussion stage feature Is an improvement or enhancement docs Documentation related and removed docs Documentation related labels Aug 9, 2020

ananyahjha93 self-assigned this Aug 9, 2020

stale bot added the won't fix This will not be worked on label Oct 22, 2020

stale bot closed this as completed Oct 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Scheduler taking a value to the step function #2849

Support for Scheduler taking a value to the step function #2849

Alicegaz commented Aug 6, 2020

github-actions bot commented Aug 6, 2020

awaelchli commented Aug 9, 2020

awaelchli commented Aug 9, 2020

YannDubs commented Aug 9, 2020 •

edited

Loading

ananyahjha93 commented Aug 9, 2020 •

edited

Loading

awaelchli commented Aug 9, 2020

Alicegaz commented Aug 10, 2020 •

edited

Loading

williamFalcon commented Aug 10, 2020

YannDubs commented Aug 10, 2020 •

edited

Loading

ananyahjha93 commented Aug 10, 2020

stale bot commented Oct 22, 2020

Support for Scheduler taking a value to the step function #2849

Support for Scheduler taking a value to the step function #2849

Comments

Alicegaz commented Aug 6, 2020

github-actions bot commented Aug 6, 2020

awaelchli commented Aug 9, 2020

awaelchli commented Aug 9, 2020

YannDubs commented Aug 9, 2020 • edited Loading

ananyahjha93 commented Aug 9, 2020 • edited Loading

awaelchli commented Aug 9, 2020

Alicegaz commented Aug 10, 2020 • edited Loading

williamFalcon commented Aug 10, 2020

YannDubs commented Aug 10, 2020 • edited Loading

ananyahjha93 commented Aug 10, 2020

stale bot commented Oct 22, 2020

YannDubs commented Aug 9, 2020 •

edited

Loading

ananyahjha93 commented Aug 9, 2020 •

edited

Loading

Alicegaz commented Aug 10, 2020 •

edited

Loading

YannDubs commented Aug 10, 2020 •

edited

Loading