Unexpected spikes in training loss #860

Enry99 · 2025-03-12T11:32:19Z

Enry99
Mar 12, 2025

Hi
When training a 2-layer equivariant (L=1) MACE model I'm getting some random spikes in training loss, as in the image here:

These seem to be only fluctuations, since the trend of the training loss is not affected, and the validation loss keeps decreasing smoothly.

Are these spikes a problem, or I can safely ignore them?

I am using these hyperparameters for training:
--lr=0.005
--ema
--ema_decay=0.99
--scheduler_patience=5

Should I perhaps try to decrease the learning rate or increase ema_decay?

Thanks in advance for the help

Answered by ilyes319

Mar 12, 2025

Hey @Enry99,

Note that only the validation model is the averaged model and not the training one when using ema, hence why valid is much smoother.
Spike in the training loss are not so problematic, because of ema. The bad spikes are the ones on the validation set, which is not the case atm. You can try to increase the ema_decay all the way to 0.9999, to see if that makes a better model.

ADAM is known to be potentially spiky with noisy dataset and it usually not a problem. You can reduce the learning rate to 0.001 if that helps.

View full answer

gabor1 · 2025-03-12T11:36:12Z

gabor1
Mar 12, 2025
Maintainer

we've been finding that a higher ema often makes much more stable and ultimately better models.

0 replies

ilyes319 · 2025-03-12T11:39:06Z

ilyes319
Mar 12, 2025
Maintainer

Hey @Enry99,

Note that only the validation model is the averaged model and not the training one when using ema, hence why valid is much smoother.
Spike in the training loss are not so problematic, because of ema. The bad spikes are the ones on the validation set, which is not the case atm. You can try to increase the ema_decay all the way to 0.9999, to see if that makes a better model.

ADAM is known to be potentially spiky with noisy dataset and it usually not a problem. You can reduce the learning rate to 0.001 if that helps.

0 replies

Enry99 · 2025-03-12T14:03:44Z

Enry99
Mar 12, 2025
Author

Thank you very much for the reply.
I will try to tweak both ema_decay and lr to see if I get any improvements

Best regards

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected spikes in training loss #860

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Unexpected spikes in training loss #860

Enry99 Mar 12, 2025

Replies: 3 comments

gabor1 Mar 12, 2025 Maintainer

ilyes319 Mar 12, 2025 Maintainer

Enry99 Mar 12, 2025 Author

Enry99
Mar 12, 2025

gabor1
Mar 12, 2025
Maintainer

ilyes319
Mar 12, 2025
Maintainer

Enry99
Mar 12, 2025
Author