Does SWA option reset scheduler's state? #9444

dazzle-me · 2021-09-10T19:20:05Z

dazzle-me
Sep 10, 2021

Situation :
I tried to use stochastic_weight_avg=True in my training but faced unexpected behavior of the scheduler - it gives different LR-curves when we train with SWA and without it.

Current behavior :
At the current setting I expect learning rate to decay every epoch up until the end (red line, you can check scheduler implementation yourself and ensure that LR is strictly decreasing after reaching maximum at lr_ramp_ep epochs)

Expected behavior :
LR-monitor should output same curves for both stochastic_weight_avg=True and stochastic_weight_avg=False flags.

The only reason I'm not opening an issue here is because I'm not sure if it is an intended behavior or not, but I'll provide self-contained example just in case :

import pytorch_lightning as pl
import torch

class BoringModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(4, 2)

    def training_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = torch.nn.functional.mse_loss(output, torch.ones_like(output))
        return {"loss": loss}

    @staticmethod
    def lrfn(epoch, 
             lr_start = 0.0001, 
             lr_max = 0.000015, 
             lr_min = 0.0000001, 
             lr_ramp_ep = 1, 
             lr_sus_ep  = 0, 
             lr_decay   = 0.7):
        if epoch < lr_ramp_ep:
            lr = (lr_max - lr_start) / lr_ramp_ep * epoch + lr_start   
        elif epoch < lr_ramp_ep + lr_sus_ep:
            lr = lr_max    
        else:
            lr = (lr_max - lr_min) * lr_decay**(epoch - lr_ramp_ep - lr_sus_ep) + lr_min    
        return lr
    
    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1)
        return [optimizer], \
               [
                    torch.optim.lr_scheduler.LambdaLR(
                            optimizer, 
                            lr_lambda=lambda epoch : \
                            self.lrfn(epoch, lr_max=0.000015 * batch_size, lr_min=0.00001)
                    )
               ]  

if __name__ == '__main__':
    global batch_size
    batch_size = 64
    max_epochs = 10
    
    model = BoringModel()
    lr_monitor = pl.callbacks.LearningRateMonitor(logging_interval='epoch')
    for SWA in [False, True]:
        logger = pl.loggers.TensorBoardLogger("./lightning_logs", version=f'SWA_{SWA}')
        trainer = pl.Trainer(
            gpus=1,
            max_epochs=max_epochs,
            callbacks=[lr_monitor],
            logger=logger,
            stochastic_weight_avg=SWA
        )

        dummy_dataloader = torch.randn(64, 4)
        trainer.fit(model, dummy_dataloader)

Also, pytorch-lightning==1.3.8, torch==1.8.1+cu111

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does SWA option reset scheduler's state? #9444

{{title}}

Replies: 0 comments

Select a reply

Does SWA option reset scheduler's state? #9444

dazzle-me Sep 10, 2021

Replies: 0 comments

dazzle-me
Sep 10, 2021