Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed LR schedule and initialization with the model #36

Open
Maratyszcza opened this issue Jan 20, 2017 · 11 comments
Open

Embed LR schedule and initialization with the model #36

Maratyszcza opened this issue Jan 20, 2017 · 11 comments

Comments

@Maratyszcza
Copy link
Contributor

I tried to implement SqueezeNet as a torchvision model and train it via ImageNet example, and found that it doesn't converge as is. The reference code differs in two aspect:

  • All but the last convolutions are initialized with Xavier Glorot initializer, the last is normal with stdev 0.01
  • The learning rate is linearly decreased (polynomial schedule with power=1).

In PyTorch these aspects are hard-coded inside the ImageNet example, but I think it makes sense to make them part of the model definition in torch.vision. What's your position on it?

@apaszke
Copy link
Contributor

apaszke commented Jan 20, 2017

Weights can be initialized in model's __init__, so it has nothing to do with the imagenet example, right?

As for lr schedule, I think we can just do sth like

if hasattr(model, 'lr_schedule'):
    lr = model.lr_schedule(epoch)
else:
    lr = args.lr * (0.1 ** (epoch // 30))

@colesbury
Copy link
Member

I've been trying to put weight initialization as part of the model, since it often seem particular to the type of architecture. I added it to the ResNet definition and I'm going to add it to the VGG model def:
https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py#L112

I'm not sure about learning rate schedule. It seems awkward to put it as part of the model definition, but as you point out, hard coding it in the ImageNet example isn't ideal either

@Maratyszcza
Copy link
Contributor Author

@apaszke @colesbury Thanks, I will do the same for SqueezeNet. Are there pre-defined initialization functions/classes for popular initialization schemes (e.g. like in Neon or Keras)?

@colesbury I think ideally we should provide a default learning schedule as a part of torch.vision models and let users override it via command-line arguments.

@apaszke
Copy link
Contributor

apaszke commented Jan 20, 2017

Yes, we should add them in nn somewhere.

@eladhoffer
Copy link
Contributor

eladhoffer commented Jan 21, 2017

What do you think about this kind of regime inside of the model?
https://github.com/eladhoffer/convNet.pytorch/blob/master/models/alexnet.py

@apaszke
Copy link
Contributor

apaszke commented Jan 21, 2017

Can't open it, are you sure the link is correct and the repo is public?

@eladhoffer
Copy link
Contributor

@apaszke
Copy link
Contributor

apaszke commented Jan 21, 2017

That's one way to approach it, but I'm not sure if it's the most convenient one. Having a function that returns an optimizer for a given epoch seems more powerful.

@alykhantejani
Copy link
Contributor

Is there/will there be a nice way to adapt the learning rate or momentum but keep other state in the optimizer, i.e. for Adam

@alykhantejani
Copy link
Contributor

Should this issue be moved to the pytorch repo instead?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 7, 2017

As you, guys, speak also here about weight initialization, what about DenseNet if we want to use a not pretrained model ?
According to Caffe official implementation, convolutions are initialized with something like kaiming_normal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants