Allow for modifying the scaled_mm compute #144

drisspg · 2023-11-16T23:11:45Z

Summary

This does two things:

Creates a new named_tuple type ScaledMMConfig that is used to control the behavior of the scaled_mm op. This includes, emulate, fast_accumulation, and fp8_out_dtype(the latter is not currently used). It replaces the emulate arg and strings it through all the relevant infra, and updates test accordingly.
This adds the fp8 fast accum mode and enables it for the forward path and not the backward pass.

Performance

With settings use_fast_accum in the forward using the linear_float8 benchmark:

	shape	Speedup_with_False	Speedup_with_True	Percentage_Gain
0	(16384, 1024, 8192)	1.19086	1.26397	6.13912
1	(16384, 3584, 8192)	1.42227	1.48921	4.70629
2	(16384, 8192, 1280)	0.970685	0.986167	1.59497
3	(16384, 8192, 7168)	1.50755	1.54886	2.74022

float8_experimental/float8_tensor.py

float8_experimental/float8_linear_utils.py

float8_experimental/float8_linear.py

float8_experimental/float8_tensor.py

vkuzo · 2024-04-08T22:24:24Z

float8_experimental/float8_linear.py

-        self.emulate = False
+        # Defines the behavior of the matmul in the forward and backward pass
+        self.forward_config = ScaledMMConfig()
+        self.backward_config = ScaledMMConfig()


is it possible to configure the two backward gemms separately?

This is somewhat challenging , since as written today we dont have a very clean way of knowing which matmul is which

cc @bdhirsh maybe Im no thinking of something.

We have out = x@W where x = FLoat8Tensor and W = Float8Tensor.

Since W will not be used in calucalting the gradW you could tag some extra info on the activation float8tensor and since this gets used for backward this should get carried through to backwards calcs.

I think that this would be better as a follow up though since the logic gets spread out over multiple Float8Tensor instances.

since as written today we dont have a very clean way of knowing which matmul is which

yeah, this is weird because the config is really per gemm but we have to stick it on a tensor. How about something like

local: given matmul(A, B), the B matmul (second argument) always overrides A (first argument).

global: the float8 UX allows setting options for the 3 gemms, and under the hood maps it to be implemented via (1).

While not the most intuitive to implement, I think that could work?

separate PR sgtm, I do feel like we need to make all 3 gemms configurable before we lock the API down.

facebook-github-bot · 2024-04-09T04:27:15Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

vkuzo

lgtm, can we add instructions to README.md how to use this, and maybe mention that this is an intermediate state and we will expose an option to configure the 2 backward gemms separately in a future PR? thanks for adding this!

facebook-github-bot · 2024-04-09T18:17:36Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-04-09T18:46:06Z

@drisspg merged this pull request in 31877bb.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 16, 2023

drisspg mentioned this pull request Jan 16, 2024

upcoming feature tracker #187

Closed

drisspg mentioned this pull request Mar 27, 2024

add option to use fast accumulation in the float8 matmul #245

Closed

vkuzo reviewed Mar 27, 2024

View reviewed changes

float8_experimental/float8_tensor.py Outdated Show resolved Hide resolved

This breaks compile...

46de5ba

drisspg force-pushed the matmul_config branch from 495a274 to 46de5ba Compare April 3, 2024 21:58

drisspg added 2 commits April 3, 2024 15:29

rebase cleanup

49b7110

switch to namedtuples in order to work with torch.compile

f29f41f

drisspg commented Apr 8, 2024

View reviewed changes

float8_experimental/float8_linear_utils.py Outdated Show resolved Hide resolved

update benchmark to show performance gain

f0e8d7c

drisspg requested a review from vkuzo April 8, 2024 20:41

drisspg commented Apr 8, 2024

View reviewed changes

float8_experimental/float8_linear.py Outdated Show resolved Hide resolved

drisspg commented Apr 8, 2024

View reviewed changes

float8_experimental/float8_tensor.py Outdated Show resolved Hide resolved

comments

2f00319

drisspg force-pushed the matmul_config branch from a4509aa to 2f00319 Compare April 8, 2024 21:43

vkuzo reviewed Apr 8, 2024

View reviewed changes

vkuzo approved these changes Apr 9, 2024

View reviewed changes

minimal readme update

82e34cd

facebook-github-bot closed this in 31877bb Apr 9, 2024

facebook-github-bot added the Merged label Apr 9, 2024

vkuzo mentioned this pull request Jul 30, 2024

float8 upcoming feature tracker pytorch/ao#556

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for modifying the scaled_mm compute #144

Allow for modifying the scaled_mm compute #144

drisspg commented Nov 16, 2023 •

edited

Loading

vkuzo Apr 8, 2024

drisspg Apr 8, 2024

vkuzo Apr 9, 2024

vkuzo Apr 9, 2024

facebook-github-bot commented Apr 9, 2024

vkuzo left a comment

facebook-github-bot commented Apr 9, 2024

facebook-github-bot commented Apr 9, 2024

Allow for modifying the scaled_mm compute #144

Allow for modifying the scaled_mm compute #144

Conversation

drisspg commented Nov 16, 2023 • edited Loading

Summary

Performance

vkuzo Apr 8, 2024

Choose a reason for hiding this comment

drisspg Apr 8, 2024

Choose a reason for hiding this comment

vkuzo Apr 9, 2024

Choose a reason for hiding this comment

vkuzo Apr 9, 2024

Choose a reason for hiding this comment

facebook-github-bot commented Apr 9, 2024

vkuzo left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Apr 9, 2024

facebook-github-bot commented Apr 9, 2024

drisspg commented Nov 16, 2023 •

edited

Loading