You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using transformers + AdamW optimizer + batch size finder results in ~2 - 3 GB GPU memory not being freed after trainer.tune (for xlm-roberta-base). This causes OOM issues on a subsequent call of trainer.fit.
I suspect that the state of the AdamW optimizer causes this issue.
trainer._lightning_optimizers (from here) still contains the optimizer which was used for finding the correct batch size (including exp_avg stats on CUDA).
I also noticed that, for some cases, calling model.fit() resulted in a wrong fitting behavior when used together with model.tune() and tuning (batch size only) was done with a random target.
🐛 Bug
Using
transformers
+AdamW optimizer
+batch size finder
results in ~2 - 3 GB GPU memory not being freed aftertrainer.tune
(forxlm-roberta-base
). This causes OOM issues on a subsequent call oftrainer.fit
.I suspect that the state of the
AdamW optimizer
causes this issue.Please reproduce using the BoringModel
https://colab.research.google.com/drive/1cugaUmLzNvk-38OyV8zyT9M9xQY4LkfH#scrollTo=j4w0wizx5XxJ
Expected behavior
GPU memory should be freed after the batch size finder (up to the model which may stay on GPU).
Environment
The text was updated successfully, but these errors were encountered: