[FSDP2] cast scale to float32 in precompute #835

weifengpy · 2024-09-06T22:11:37Z

revert a recent PR that breaks unit tests #727
we can revisit if we should apply fp32 upcasting consistently across float8 compute and precompute

It failed unit test at my devgpu but not sure why our CI did not catch it. maybe because of no H100 in CI?

pytest -s test/float8/test_fsdp2/test_fsdp2.py -k test_transformer_parity

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2024-09-06T22:11:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/835

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 7ae2e9e with merge base e2dad4a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

weifengpy · 2024-09-06T22:14:03Z

torchao/float8/fsdp_utils.py

@@ -69,7 +69,7 @@ def precompute_float8_dynamic_scale_for_fsdp(module: nn.Module) -> None:
        scale_tensor = torch.clamp(scale_tensor, max=torch.finfo(torch.float16).max)
    local_scale_tensor = scale_tensor.to_local()
    for i, float8_linear in enumerate(float8_linears):
-        float8_linear.weight._local_tensor._precomputed_scale = local_scale_tensor[i]
+        float8_linear.weight._local_tensor._precomputed_scale = local_scale_tensor[i].to(torch.float32)


precompute should align with Float8Linear

ao/torchao/float8/float8_utils.py

Line 55 in 144445a

return res.to(torch.float32)

weifengpy · 2024-09-06T22:23:08Z

cc @crcrpar that doing to(torch.float32) at the end make float8 compute and float8 all-gather code path consistent. we will revisit if we should move to(torch.float32) early in both code path

awgu · 2024-09-06T22:24:03Z

torchao/float8/fsdp_utils.py

@@ -59,7 +59,7 @@ def precompute_float8_dynamic_scale_for_fsdp(module: nn.Module) -> None:
        return

    # inf-norm is equivalent to max(abs(w))
-    max_weights = torch._foreach_norm(weights, ord=math.inf, dtype=torch.float32)  # Partial
+    max_weights = torch._foreach_norm(weights, ord=math.inf)  # Partial


wondering why remove dtype=torch.float32 here / why does it matter?

this line is mimicing float8 compute when we do abs(max) without upscaling
https://github.com/pytorch/ao/blob/144445a7a8e988059555421caafa17e0c1678053/torchao/float8/float8_utils.py#L101-L102C28

not sure if it improves groudtruth numerics, but at least this brings numeric on par with float8 compute

ao/test/float8/test_fsdp2/test_fsdp2.py

Lines 124 to 132 in 144445a

module = self.init_transformer(weight_tying=weight_tying, dtype=dtype)

ref_module = copy.deepcopy(module)

float8_linear_config1 = Float8LinearConfig(

cast_config_weight=CastConfig(scaling_type=scaling_type_weight),

)

convert_to_float8_training(

ref_module,

config=float8_linear_config1,

)

awgu · 2024-09-06T22:25:18Z

It might be helpful if you can provide some explanation of which tensors' dtypes changed before/after revert.

awgu · 2024-09-06T23:33:56Z

revert a recent PR that regressed numerics:

I would mainly suggest to not phrase it as regressing numerics. It broke tests, so there was a test regression. However, this is just because there was a numeric mismatch now where the precompute code path does some computations in fp32 where the no-precompute code path does them in bf16.

The current change in the PR makes the two the same, but it makes the precompute path less accurate than before.

weifengpy · 2024-09-06T23:47:14Z

I would mainly suggest to not phrase it as regressing numerics. It broke tests, so there was a test regression

good point. I modifed the PR description to focus on unit test fixing. totally get your point that true numerics might be regressing

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

[FSDP2] cast scale to float32 in precompute

7ae2e9e

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 6, 2024

weifengpy commented Sep 6, 2024

View reviewed changes

weifengpy requested review from vkuzo and drisspg September 6, 2024 22:14

awgu reviewed Sep 6, 2024

View reviewed changes

vkuzo approved these changes Sep 11, 2024

View reviewed changes

vkuzo merged commit 85d03de into pytorch:main Sep 11, 2024
17 checks passed

jainapurva pushed a commit that referenced this pull request Sep 22, 2024

[FSDP2] cast scale to float32 in precompute (#835)

a035e98

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

jainapurva pushed a commit that referenced this pull request Sep 23, 2024

[FSDP2] cast scale to float32 in precompute (#835)

f3f0234

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

just dont hardcode paths (pytorch#835)

5babb14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP2] cast scale to float32 in precompute #835

[FSDP2] cast scale to float32 in precompute #835

weifengpy commented Sep 6, 2024 •

edited

Loading

pytorch-bot bot commented Sep 6, 2024 •

edited

Loading

weifengpy Sep 6, 2024

weifengpy commented Sep 6, 2024 •

edited

Loading

awgu Sep 6, 2024

weifengpy Sep 6, 2024

awgu commented Sep 6, 2024

awgu commented Sep 6, 2024

weifengpy commented Sep 6, 2024

	module = self.init_transformer(weight_tying=weight_tying, dtype=dtype)
	ref_module = copy.deepcopy(module)
	float8_linear_config1 = Float8LinearConfig(
	cast_config_weight=CastConfig(scaling_type=scaling_type_weight),
	)
	convert_to_float8_training(
	ref_module,
	config=float8_linear_config1,
	)

[FSDP2] cast scale to float32 in precompute #835

[FSDP2] cast scale to float32 in precompute #835

Conversation

weifengpy commented Sep 6, 2024 • edited Loading

pytorch-bot bot commented Sep 6, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/835

✅ No Failures

weifengpy Sep 6, 2024

Choose a reason for hiding this comment

weifengpy commented Sep 6, 2024 • edited Loading

awgu Sep 6, 2024

Choose a reason for hiding this comment

weifengpy Sep 6, 2024

Choose a reason for hiding this comment

awgu commented Sep 6, 2024

awgu commented Sep 6, 2024

weifengpy commented Sep 6, 2024

weifengpy commented Sep 6, 2024 •

edited

Loading

pytorch-bot bot commented Sep 6, 2024 •

edited

Loading

weifengpy commented Sep 6, 2024 •

edited

Loading