Add LBFGS optimizer as an option #792

vue1999 · 2025-01-20T23:25:58Z

Adds LBFGS optimizer support.

Key details

Supports distributed training across multiple GPUs when using the —distributed flag
Handles datasets larger than available memory through batch processing
Uses strong Wolfe conditions for line search
modified dataloader to avoid dropping samples when using LBFGS

How to use

Add the —lbfgs flag
Use the largest batch_size that fits into memory (for optimal performance)
Disable EMA

Advantages

More deterministic training than first order optimizers
Less sensitivity to random seeds
Batch size independent convergence path
Fewer hyperparameters to tune
Possibly faster and better convergence

Notes

First-order optimizer hyperparameters (weight decay, learning rate, etc.) are ignored when using LBFGS
EMA should be disabled as it's redundant with line search
Unlike first-order optimizers where batch size affects the optimization dynamics, in LBFGS the batch_size parameter is purely computational:
- Each optimization step requires gradients computed over the entire training set (at several points in the model space)
- The full-dataset gradient computation is split into batches for memory efficiency and performance (by parallelising across GPUs)
- The optimizer only sees the accumulated full-dataset gradients, regardless of how they were computed
While it can be used immediately from the start of training, it is likely that at the beginning ADAM is significantly faster because the precision of LBFGS is not that beneficial initially.
In the examples we looked at, we first trained the models using ADAM then switched to LBFGS. (This can be done by restarting training from a checkpoint and using the —lbfgs flag.)
It doesn’t work with multi-head training

Examples

3BPA

SPICE (H,C atoms only subs

et)

…s with different optimizers

…atches

…a GPUs

… on extra GPUs" This reverts commit 321996d.

This reverts commit da65079.

This reverts commit 098d0c8.

Shampoo optimizer

…or this

ilyes319 · 2025-03-05T16:31:27Z

hey @vue1999, that looks great! is it ready to merge?

vue1999 · 2025-03-12T18:32:12Z

Yes. We didn’t do exhaustive testing, but it works with the multihead options too.

ilyes319 · 2025-03-12T18:34:46Z

would be worth adding some tests for it, inspired with the run train one (test_run_train.py). just a simple training run would be fine.

vue1999 and others added 30 commits December 4, 2024 00:10

using torch.optim.LBFGS

83d15e3

add LBFGS step using lbfgsnew.py

7b06eca

add lbfgs_config argument

2677ae2

add lbfgs as an option for the optimizer argument

337a607

remove --lbfgs argument

5945ba1

cleanup

5d9ff24

move lbfgs out of optimizer args, because checkpoint loading conflict…

fd57dae

…s with different optimizers

cleanup

5002ff3

add CUDA memory logs for debugging

93b3e41

test - enable keep_graph for foundation model

34d7e26

disable retain_graph

5eae9d7

train only readouts with lbfgs

9cb5d65

make LBFGS always use full batch, but calculate gradients from mini b…

76b333e

…atches

fixes for the lbfgs step

246655c

use pytorch's LBFGS optimizer

c684f42

add shampoo optimizer

af19ccc

initial multi GPU LBFGS test

be846c0

small fix

eba347a

cleanup

2d422a3

add more barriers and try to prevent LBFGS steps from running on extr…

321996d

…a GPUs

Revert "add more barriers and try to prevent LBFGS steps from running…

996b0b0

… on extra GPUs" This reverts commit 321996d.

add extra barriers to keep GPUs in sync

da65079

Revert "add extra barriers to keep GPUs in sync"

e134289

This reverts commit da65079.

add logging for debugging multi GPU LBFGS

3de91bf

test a different shampoo implementation

098d0c8

log on all GPUs

3cd1cf5

print batch size

84aecdd

evert "test a different shampoo implementation"

4e1ea2f

This reverts commit 098d0c8.

tune shampoo params

5fa3a34

add waiting signal for multi GPU LBFGS

71b86b2

vue1999 and others added 22 commits December 29, 2024 22:31

Merge branch 'lbfgs-multi-gpu' into shampoo-optimizer

4393f6f

Merge pull request #2 from vue1999/shampoo-optimizer

81020d5

Shampoo optimizer

Merge branch 'ACEsuit:main' into lbfgs-multi-gpu

3774cbb

workaround for shampoo's lack of checkpoint handling for testing

be70e51

add wolfe line search, remove debug logs, remove redundant barriers

bc02bcb

adjust normalisation to handle varying batch sizes

99c8c8c

fix single GPU LBFGS training

482a864

remove redundant barrier

45fee81

fix spelling

2f2be1d

remove debug logging

f252276

adjust shampoo params for testing

9923ba1

adjust shampoo params for testing

4532b12

Merge branch 'ACEsuit:main' into lbfgs-multi-gpu

1bf4bb7

remove shampoo related code

63a3c7c

cleanup

4b9ee30

simplify LBFGS option

8b0b396

cleanup

8e26459

fix restarting LBFGS checkpoints

0857727

reduce iteration count for spice test, always save last checkpoint

aa392a6

disable drop_last when using LBFGS, adjust normalisation to account f…

f2243bc

…or this

Fix issue by initialising opt_start_epoch as None

5da7dff

fix formatting to make CI checks happy

1b138ce

vue1999 marked this pull request as ready for review February 17, 2025 00:32

Update run_train.py

24c9bec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LBFGS optimizer as an option #792

Add LBFGS optimizer as an option #792

vue1999 commented Jan 20, 2025 •

edited

Loading

ilyes319 commented Mar 5, 2025

vue1999 commented Mar 12, 2025

ilyes319 commented Mar 12, 2025

Add LBFGS optimizer as an option #792

Are you sure you want to change the base?

Add LBFGS optimizer as an option #792

Conversation

vue1999 commented Jan 20, 2025 • edited Loading

ilyes319 commented Mar 5, 2025

vue1999 commented Mar 12, 2025

ilyes319 commented Mar 12, 2025

vue1999 commented Jan 20, 2025 •

edited

Loading