Skip to content

Unexpected spikes in training loss #860

Closed Answered by ilyes319
Enry99 asked this question in Q&A
Discussion options

You must be logged in to vote

Hey @Enry99,

Note that only the validation model is the averaged model and not the training one when using ema, hence why valid is much smoother.
Spike in the training loss are not so problematic, because of ema. The bad spikes are the ones on the validation set, which is not the case atm. You can try to increase the ema_decay all the way to 0.9999, to see if that makes a better model.

ADAM is known to be potentially spiky with noisy dataset and it usually not a problem. You can reduce the learning rate to 0.001 if that helps.

Replies: 3 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by ilyes319
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants