don't auto-recompute attention or linear #1648

t-vi · 2025-01-16T10:08:02Z

Thank you @kshitij12345 for the detailed issue.

kshitij12345

Overall looks good, just have one question. Also, we should add a test with a simple 2 layer model to verify sdpa or linear is not recomputed in backward.

Thank you @t-vi

thunder/torch/__init__.py

kshitij12345

LGTM, thank you @t-vi

thunder/tests/test_networks.py

IvanYashchuk · 2025-01-16T13:02:44Z

That's a good quick fix, but can we revert to the previous default behavior: don't do any recomputation except for fused operations? Current logic doesn't use the information on whether the operation will be fused. To overwrite the default rule we can propagate torch.utils.checkpoint tags or have opt-in automatic passes.

thunder/core/prims.py

t-vi · 2025-01-16T14:09:46Z

That's a good quick fix, but can we revert to the previous default behavior: don't do any recomputation except for fused operations? Current logic doesn't use the information on whether the operation will be fused. To overwrite the default rule we can propagate torch.utils.checkpoint tags or have opt-in automatic passes.

Maybe, I had that as one of the options in the issue, we went for this for now.

To my mind there are multiple parts:

we want to be sure that we don't cause memory regressions,
I think the rematerialization for forward and backward eventually needs to work without creating the joint trace.

don't auto-recompute attention or linear

5ad4f90

t-vi requested review from mruberry and lantiga as code owners January 16, 2025 10:08

fix examine memory

5c5d0fd

kshitij12345 reviewed Jan 16, 2025

View reviewed changes

thunder/torch/__init__.py Show resolved Hide resolved

refine criterion and add test

07bd3e1

t-vi enabled auto-merge (squash) January 16, 2025 12:13

adapt examine memory

eba2604

kshitij12345 approved these changes Jan 16, 2025

View reviewed changes

thunder/tests/test_networks.py Show resolved Hide resolved

broaden criterion

5e61680

lantiga approved these changes Jan 16, 2025

View reviewed changes

IvanYashchuk reviewed Jan 16, 2025

View reviewed changes

thunder/core/prims.py Show resolved Hide resolved

Merge branch 'main' into tom/dont_recompute_sdpa_linear

d5ae52c

t-vi merged commit ef06bd0 into main Jan 16, 2025
49 checks passed

t-vi deleted the tom/dont_recompute_sdpa_linear branch January 16, 2025 14:44

riccardofelluga pushed a commit that referenced this pull request Jan 27, 2025

don't auto-recompute attention or linear (#1648)

40a97bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't auto-recompute attention or linear #1648

don't auto-recompute attention or linear #1648

t-vi commented Jan 16, 2025

kshitij12345 left a comment

kshitij12345 left a comment

IvanYashchuk commented Jan 16, 2025

t-vi commented Jan 16, 2025

don't auto-recompute attention or linear #1648

don't auto-recompute attention or linear #1648

Conversation

t-vi commented Jan 16, 2025

kshitij12345 left a comment

Choose a reason for hiding this comment

kshitij12345 left a comment

Choose a reason for hiding this comment

IvanYashchuk commented Jan 16, 2025

t-vi commented Jan 16, 2025