gemlite integration in torchao #1034

HDCharles · 2024-10-08T16:22:32Z

gemlite integration in torchao

Summary:

This PR adds support for gemlite kernels in torchao using a subclass
integration with the gemlite_uintx_weight_only constructor. This works
for int4 grouped and ungrouped assymmetric oeight only quantization and
int8 symmetric ungrouped quantization for fp16 models. TP support
through DTensor is included in thsi PR

in the process of integrating gemlite into AQT i also made some fixes to
a few quant primitives that are being used which previously were not.

Test Plan:

test_integration.py -k "test_gemlite_layout"
test_affine_quantized_tensor_parallel.py -k "test_tp_gemlite"

see benchmarks.sh for gemlite benchmarks as well.

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-10-08T16:22:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1034

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6b786d3 with merge base 039cef4 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mobicham · 2024-10-09T09:14:10Z

I see that you removed the pruning config, that will produce incorrect results for group-sizes lower than 128

HDCharles · 2024-10-15T16:31:54Z

its there now, needed the custom op support stuff

Summary: This PR adds support for gemlite kernels in torchao using a subclass integration with the gemlite_uintx_weight_only constructor. This works for int4 grouped and ungrouped assymmetric oeight only quantization and int8 symmetric ungrouped quantization for fp16 models. TP support through DTensor is included in thsi PR in the process of integrating gemlite into AQT i also made some fixes to a few quant primitives that are being used which previously were not. Test Plan: test_integration.py -k "test_gemlite_layout" test_affine_quantized_tensor_parallel.py -k "test_tp_gemlite" see benchmarks.sh for gemlite benchmarks as well. Reviewers: Subscribers: Tasks: Tags: new gemlite integration using pip install Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: tests ran Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: fixing gemlite to do int4 matmul instead of fp16 fp16 Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: running tests Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: more testing Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: AQT integration wip Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Wip Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: testing on gemlite a100_int8_tuning branch Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: gemlite subclass testing bitpacking 8 bits Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: bug fixing stuff Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: hicham fixes Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: new benchmarks Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: testing gemlite 8 bit Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: WIP Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: tp support Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: wip Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: final Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

jerryzh168 · 2024-12-16T18:28:06Z

torchao/_models/llama/generate.py

@@ -1053,6 +1088,7 @@ def callback(x):
    )

    args = parser.parse_args()
+    print(args)


nit: remove

jerryzh168 · 2024-12-16T18:31:56Z

torchao/dtypes/affine_quantized_tensor.py

@@ -251,7 +261,8 @@ def from_hp_to_intx(
                zero_point_domain,
            )
            # choose_qparams_affine is a custom op that does support returning optional Tensors. We thus set the zero_point to None if its domain is None
-            if zero_point_domain is None:
+            # TODO should probably consolidate ZeroPointDomain.NONE and None


yeah we should not have both, also doing some refactor in https://github.com/pytorch/ao/pull/1402/files#diff-7c9b4c8c6d4ef9c47873263304a335d5cf56c3ac9f98ba10b994cd80dc9c2709L536

jerryzh168 · 2024-12-16T18:32:23Z

torchao/dtypes/affine_quantized_tensor_ops.py

@@ -20,6 +20,10 @@
    _linear_int8_act_int8_weight_block_sparse_check,
    _linear_int8_act_int8_weight_block_sparse_impl,
 )
+from torchao.dtypes.uintx.gemlite_layout import (
+    _linear_fp_act_int4_weight_gemlite_check,


nit: fp16? seems like gemlite only works with fp16 right now, but can be a follow up PR

jerryzh168 · 2024-12-16T18:33:22Z

torchao/dtypes/uintx/gemlite_layout.py

+        ), f"GemliteAQTTensorImpl only works with GemliteLinearTriton but got {_layout}"
+        group_size, bit_width = _layout.group_size, _layout.bit_width
+
+        torch._dynamo.config.inline_inbuilt_nn_modules = False


nit: do we need to restore the state afterwords

i think this is necessary for gemlite to function correctly,

jerryzh168 · 2024-12-16T18:35:38Z

torchao/dtypes/uintx/gemlite_layout.py

+        # to use the gemlite matmul kernel, which expects teh weight to be passed in as is,
+        # we ignore the transpose
+        if func is aten.detach.default or func is aten.t.default:
+            return return_and_correct_aliasing(


would be good to record the transposed state probably:

ao/torchao/dtypes/uintx/tensor_core_tiled_layout.py

Line 306 in 200589b

not args[0].transposed,

in case it's used later

that line does literally nothing though, the actual state isn't tracked since its hardcoded to false always:

ao/torchao/dtypes/uintx/tensor_core_tiled_layout.py

Line 208 in 200589b

self.transposed = False

long term, i think this may be better to be something handled by AQT itself rather than the tensor impl, it feels like transposing the tensor_impl would mean unpacking and repacking the weight. Whereas what we actually want is to book keep the representation which makes more sense at the top level where the actual shape is changing.

jerryzh168 · 2024-12-16T18:36:24Z

torchao/dtypes/uintx/gemlite_layout.py

+    return (
+        # input is native fp16 tensor
+        not is_traceable_wrapper_subclass(input_tensor)
+        # and input_tensor.dtype == torch.float16


don't we need this?

its a little redundant since we block creation of the subclass unpon creation, normally when you mess this type of thing up it gives you a dtype mismatch error which is what will happen now, whereas adding this would likely bypass the error which i don't know if users would actually want.

could see it being either way though.

jerryzh168

looks good, I think we should merge now to unblock benchmarking and fix the follow up a bit later

drisspg · 2024-12-16T19:43:36Z

torchao/quantization/quant_api.py

+    from torchao.dtypes.uintx.gemlite_layout import apply_gemlite_quant
+
+    use_hqq = True if bit_width == 4 else False
+    apply_fn = lambda weight: apply_gemlite_quant(


Should we raise if gemlite is not installed?

I think so, please feel free to add more review comments, and we should have follow up fixes.

just want to merge now to unblock benchmarking in sglang for now

* gemlite integration in torchao Summary: This PR adds support for gemlite kernels in torchao using a subclass integration with the gemlite_uintx_weight_only constructor. This works for int4 grouped and ungrouped assymmetric oeight only quantization and int8 symmetric ungrouped quantization for fp16 models. TP support through DTensor is included in thsi PR in the process of integrating gemlite into AQT i also made some fixes to a few quant primitives that are being used which previously were not. Test Plan: test_integration.py -k "test_gemlite_layout" test_affine_quantized_tensor_parallel.py -k "test_tp_gemlite" see benchmarks.sh for gemlite benchmarks as well. Reviewers: Subscribers: Tasks: Tags: new gemlite integration using pip install Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: tests ran Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: fixing gemlite to do int4 matmul instead of fp16 fp16 Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: running tests Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: more testing Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: AQT integration wip Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Wip Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: testing on gemlite a100_int8_tuning branch Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: gemlite subclass testing bitpacking 8 bits Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: bug fixing stuff Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: hicham fixes Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: new benchmarks Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: testing gemlite 8 bit Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: WIP Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: tp support Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: wip Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: final Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fixing regressions Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 8, 2024

HDCharles changed the title ~~(wip) gemlite integration and llama batchsize>1~~ gemlite integration in torchao Oct 8, 2024

HDCharles force-pushed the 050_gemlite_integration branch from 923f657 to e417818 Compare October 29, 2024 09:03

HDCharles force-pushed the 050_gemlite_integration branch 2 times, most recently from d3b4ad7 to 61e2564 Compare November 22, 2024 10:40

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Make API and server compatible with OpenAI API (pytorch#1034)

e3ec7ac

HDCharles force-pushed the 050_gemlite_integration branch 2 times, most recently from 1383982 to 749c0d4 Compare December 16, 2024 10:38

HDCharles added the topic: new feature Use this tag if this PR adds a new feature label Dec 16, 2024

HDCharles requested a review from jerryzh168 December 16, 2024 10:39

HDCharles force-pushed the 050_gemlite_integration branch from 749c0d4 to 6bc64aa Compare December 16, 2024 10:51

fixing regressions

6b786d3

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

HDCharles force-pushed the 050_gemlite_integration branch from ef225a0 to 6b786d3 Compare December 16, 2024 13:54

jerryzh168 reviewed Dec 16, 2024

View reviewed changes

jerryzh168 approved these changes Dec 16, 2024

View reviewed changes

jerryzh168 merged commit 603d908 into main Dec 16, 2024
18 checks passed

drisspg reviewed Dec 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gemlite integration in torchao #1034

gemlite integration in torchao #1034

HDCharles commented Oct 8, 2024 •

edited

Loading

pytorch-bot bot commented Oct 8, 2024 •

edited

Loading

mobicham commented Oct 9, 2024

HDCharles commented Oct 15, 2024

jerryzh168 Dec 16, 2024

jerryzh168 Dec 16, 2024

jerryzh168 Dec 16, 2024

jerryzh168 Dec 16, 2024

HDCharles Dec 17, 2024

jerryzh168 Dec 16, 2024

HDCharles Dec 17, 2024 •

edited

Loading

jerryzh168 Dec 16, 2024

HDCharles Dec 17, 2024

jerryzh168 left a comment •

edited

Loading

drisspg Dec 16, 2024

jerryzh168 Dec 16, 2024 •

edited

Loading

gemlite integration in torchao #1034

gemlite integration in torchao #1034

Conversation

HDCharles commented Oct 8, 2024 • edited Loading

pytorch-bot bot commented Oct 8, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1034

✅ No Failures

mobicham commented Oct 9, 2024

HDCharles commented Oct 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HDCharles Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryzh168 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryzh168 Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

HDCharles commented Oct 8, 2024 •

edited

Loading

pytorch-bot bot commented Oct 8, 2024 •

edited

Loading

HDCharles Dec 17, 2024 •

edited

Loading

jerryzh168 left a comment •

edited

Loading

jerryzh168 Dec 16, 2024 •

edited

Loading