-
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
we've been finding that a higher ema often makes much more stable and ultimately better models. |
Beta Was this translation helpful? Give feedback.
-
Hey @Enry99, Note that only the validation model is the averaged model and not the training one when using ema, hence why valid is much smoother. ADAM is known to be potentially spiky with noisy dataset and it usually not a problem. You can reduce the learning rate to 0.001 if that helps. |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for the reply. Best regards |
Beta Was this translation helpful? Give feedback.
Hey @Enry99,
Note that only the validation model is the averaged model and not the training one when using ema, hence why valid is much smoother.
Spike in the training loss are not so problematic, because of ema. The bad spikes are the ones on the validation set, which is not the case atm. You can try to increase the ema_decay all the way to 0.9999, to see if that makes a better model.
ADAM is known to be potentially spiky with noisy dataset and it usually not a problem. You can reduce the learning rate to 0.001 if that helps.