Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LAMB optimizer regex parsing #1532

Merged
merged 7 commits into from
Apr 3, 2020

Conversation

jarednielsen
Copy link
Contributor

See my issue at #1530

The LAMB optimizer declares that its exclude_from_weight_decay argument should take in a comma-separated string of regex patterns. However, the code expects a list of regex patterns and instead iterates through each character in the string. Thus nearly every call to _do_use_weight_decay() returns False.

I attempted to pass in a list to circumvent this bug, but this leads to a typeguard error. So there's no easy way around this in the meantime. A similar bug exists for exclude_from_layer_adaption.

Two proposed fixes, and I'm happy to contribute either:

  • Change the desired datatype from Optional[str] to List[str]. This would be preferred, and match the style of other implementations in the TensorFlow repo. See here for a list of examples. This is the current PR.
  • Add .split(',') to exclude_from_weight_decay and exclude_from_layer_adaption in the constructor.

@bot-of-gabrieldemarmiesse

@junjiek

You are owner of some files modified in this pull request.
Would you kindly review the changes whenever you have the time to?
Thank you very much.

@gabrieldemarmiesse gabrieldemarmiesse self-assigned this Apr 1, 2020
@gabrieldemarmiesse
Copy link
Member

Thanks @jarednielsen for this pull request. I'll take a longer look tomorrow to have the big, for the moments, some quick thoughts in 2 minutes:

  • We need to be backward compatible, at least for the near term. We can throw a depreciation warning if we don't want to keep the old behavior in the long term.
  • Being as close as possible to the TF API is very important for the long term.

If there is a way we can fit both in this pull request, that'd be perfect :)

@jarednielsen
Copy link
Contributor Author

jarednielsen commented Apr 1, 2020

Thanks for the quick response! I'm all for backwards compatibility. I'm just not seeing how any usage of this parameter could possibly succeed. For example, using

  • exclude_from_weight_decay=["weight", "bias"] would fail because of typeguard.
  • exclude_from_weight_decay="weight,bias" would be iterated over as ["w", "e", "i", "g", "h", "t", ",", "b", "i", "a", "s"]. If any single one of these characters existed in the parameter name, then the parameter would be excluded. This is clearly wrong.

What backwards-compatible behavior would we want to keep? I suppose if your regex was literally just a single character and you only had one pattern to match, then that use case would break. So maybe exclude_from_weight_decay="*"? Although in that case they should just set weight_decay=0 and be done with it.

@seanpmorgan
Copy link
Member

seanpmorgan commented Apr 2, 2020

I'm not so sure backwards compatibility here is what's needed since passing a string variable (as enforced in typeguard) would fail. This isn't so much an improvement, but rather a bug fix.

The model garden is one of the more popular repos depending on addons so I think it's important we patch this onto 0.8.4

@seanpmorgan seanpmorgan added the backport r0.8 Will backport any PR merged with this label to the r0.8 branch label Apr 2, 2020
@seanpmorgan
Copy link
Member

@jarednielsen would you mind adding a test case for calling the optimizer with these parameters so this type of thing would be caught in the future please?

@jarednielsen
Copy link
Contributor Author

@seanpmorgan Sure, added the test!

@@ -401,3 +401,11 @@ def test_get_config(self):
opt = lamb.LAMB(1e-4)
config = opt.get_config()
self.assertEqual(config["learning_rate"], 1e-4)

def test_exclude_weight_decay(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for exclude_from_layer_adaption as well please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@gabrieldemarmiesse gabrieldemarmiesse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the pull request! That's some great investigation work and solution!

@gabrieldemarmiesse gabrieldemarmiesse merged commit ce16e62 into tensorflow:master Apr 3, 2020
jrruijli pushed a commit to jrruijli/addons that referenced this pull request Dec 23, 2020
* Fix type for LAMB optimizer exclude_from_weight_decay

* Add import

* Add optional wrapper

* Add test

* Layer adaption test

* Typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport r0.8 Will backport any PR merged with this label to the r0.8 branch cla: yes optimizers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants