Unexpected Performance of L=2 Model in Liquid Water Training #859

cesaremalosso · 2025-03-12T10:11:59Z

cesaremalosso
Mar 12, 2025

Hi,

I've been training a model for liquid water using the following parameters, varying L between 1 and 2:

srun -n 4 mace_run_train \
    --name="L1_3" \
    --train_file="../../dataset/train_configs.xyz" \
    --valid_file="../../dataset/test_configs.xyz" \
    --test_file="../../dataset/test_configs.xyz" \
    --config_type_weights='{"Default":1.0}' \
    --model="MACE" \
    --E0s='{1:-13.4131335925908, 8:-429.4668907468319}' \
    --max_L=1 \
    --r_max=5.0 \
    --num_interactions=2 \
    --correlation=3 \
    --max_ell=3 \
    --valid_batch_size=4 \
    --batch_size=4 \
    --max_num_epochs=200 \
    --energy_key='energy' \
    --forces_key='forces' \
    --swa \
    --start_swa=160 \
    --ema \
    --ema_decay=0.99 \
    --amsgrad \
    --num_workers=1 \
    --restart_latest \
    --device=cuda \
    --seed=3926 \
    --distributed

I trained four equivalent models with different seeds for both L=1 and L=2, selecting the best-performing model in each case. However, the results are unexpected:

Best L=1 Model Performance:

2025-03-12 01:32:41.159 INFO: Error-table on TRAIN and VALID:
+---------------+---------------------+------------------+-------------------+
|  config_type  | RMSE E / meV / atom | RMSE F / meV / A | relative F RMSE % |
+---------------+---------------------+------------------+-------------------+
| train_default |            2.0      |          9.7     |          0.51     |
| valid_default |            0.2      |          6.7     |          0.82     |
+---------------+---------------------+------------------+-------------------+

Best L=2 Model Performance:

2025-03-12 10:00:34.345 INFO: Error-table on TRAIN and VALID:
+---------------+---------------------+------------------+-------------------+
|  config_type  | RMSE E / meV / atom | RMSE F / meV / A | relative F RMSE % |
+---------------+---------------------+------------------+-------------------+
| train_default |            1.8      |          9.2     |          0.49     |
| valid_default |            0.7      |          6.3     |          0.77     |
+---------------+---------------------+------------------+-------------------+

While L=2 performs slightly better for forces, it is noticeably worse for energy, especially on the validation set. I was expecting L=2 to yield improved accuracy overall.

Do you have any insights into why this might be happening?

Best,
Cesare

Answered by ilyes319

Mar 12, 2025

Dear @cesaremalosso, the performance improvement between L=1 and L=2 is system dependent. Usually we see more performance gain for systems that are quite hard to fit. Your system seems quite easy to fit looking at your errors, so I am not too surprised.

To rule out any other potential problems, can you share the log files for the two model training so I can check everything is working at intended.

View full answer

ilyes319 · 2025-03-12T10:17:23Z

ilyes319
Mar 12, 2025
Maintainer

Dear @cesaremalosso, the performance improvement between L=1 and L=2 is system dependent. Usually we see more performance gain for systems that are quite hard to fit. Your system seems quite easy to fit looking at your errors, so I am not too surprised.

To rule out any other potential problems, can you share the log files for the two model training so I can check everything is working at intended.

2 replies

cesaremalosso Mar 12, 2025
Author

Dear @ilyes319,

Thank you very much for your response. Yes, both models perform surprisingly well. I initially expected L2 to always outperform L1, even for simpler systems. I attach here the logs

Best,
Cesare

L1.log
L2.log

ilyes319 Mar 12, 2025
Maintainer

Both model look fine, I think you are in the usual case where L=1 is good enough already (which is most cases). The philosophy is to always use the smallest model that answer (correctly) your scientific question, because they are faster. So I would recommend you to stick with the L=1 model. You can even try to see if L=0 is enough for you.

Some extra points:

I see you are still using 'energy' and 'forces' as keys. As you see from the warning: 2025-03-11 18:40:02.940 WARNING: Since ASE version 3.23.0b1, using forces_key 'forces' is no longer safe when communicating between MACE and ASE. We recommend using a different key, rewriting 'forces' to 'REF_forces'. You need to use --forces_key='REF_forces' to specify the chosen key name.
I would highly recommend you to rename your keys and change your key inputs to MACE, to be the safest.
Can you try to increase the ema_decay to 0.999 to make the training a bit smoother.
You can also try to decrease the learning rate in the second stage to 0.0001 instead of 0.001.

gabor1 · 2025-03-12T10:36:39Z

gabor1
Mar 12, 2025
Maintainer

Your models both look quite good. The fact that your validation error is so much lower than your training error suggests that your validation set is not representative or that it is quite small. The validation set (and the error on it) is used to select the best model to save, so I recommend that you make your validation set bigger (you can even set up a separate file with more configs than your training set). Your force errors look excellent. You might also want to play around with the energy weights in the second stage of the training (swa), making those weights higher will give you better energies.

where does your training set come from? if it's all md with an ab initio or force field model, then I also recommend doing a round of iterative training, i.e. run an md with your mace model and collect some configurations from that and add them to the training.

finally, your cutoff radius is 5, which is fine, but you will get a better water model (e.g. better densities) if you use 6. Remember that the current version of mace is a short range model, altogether the receptive field is num_interaction*cutoff so all the electrostatics that you are capturing is implicit in that.

you will find that raw energy and force errors although are important, aren't the only thing that signify a good or bad model. look at the predicted density for example.

10 replies

gabor1 Mar 12, 2025
Maintainer

vv10 is a better description of dispersion. so you are including that in your training data presumably.

cesaremalosso Mar 12, 2025
Author

exactly yes

gabor1 Mar 12, 2025
Maintainer

ok. so the reason your validation error is so much smaller is because all of that comes from AIMD at 300K/1bar, so it's a much less diverse set than your training set (1-4kbar). As you say if all your interest is at 1bar, this is fine. I am still a bit concerned that the best model picked out from the second stage has anomalously low energy errors. In these situations @ilyes319 usually suggests to lower the learning rate in the second stage and increase ema to 0.999 or even higher.

gabor1 Mar 12, 2025
Maintainer

but before too many more experiments, move onto more relevant tests beyond energy and force rmse.

running npt md to get the predicted density is painful for many models. one proxy we have been using quite successfully is to take some snapshots from your aimd validation set and compress/dilate the structure by +20% -10% (in volume) while keeping the molecues rigid. this "energy vs volume" curve and its match with the target dft is very predictive for the quality of the model when used for large scale npt md

cesaremalosso Mar 12, 2025
Author

Ok, thanks a lot for the tips!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected Performance of L=2 Model in Liquid Water Training #859

{{title}}

Replies: 2 comments 12 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Unexpected Performance of L=2 Model in Liquid Water Training #859

cesaremalosso Mar 12, 2025

Best L=1 Model Performance:

Best L=2 Model Performance:

Replies: 2 comments · 12 replies

ilyes319 Mar 12, 2025 Maintainer

cesaremalosso Mar 12, 2025 Author

ilyes319 Mar 12, 2025 Maintainer

gabor1 Mar 12, 2025 Maintainer

gabor1 Mar 12, 2025 Maintainer

cesaremalosso Mar 12, 2025 Author

gabor1 Mar 12, 2025 Maintainer

gabor1 Mar 12, 2025 Maintainer

cesaremalosso Mar 12, 2025 Author

cesaremalosso
Mar 12, 2025

Replies: 2 comments 12 replies

ilyes319
Mar 12, 2025
Maintainer

cesaremalosso Mar 12, 2025
Author

ilyes319 Mar 12, 2025
Maintainer

gabor1
Mar 12, 2025
Maintainer

gabor1 Mar 12, 2025
Maintainer

cesaremalosso Mar 12, 2025
Author

gabor1 Mar 12, 2025
Maintainer

gabor1 Mar 12, 2025
Maintainer

cesaremalosso Mar 12, 2025
Author