[Misc] Use scalar type to dispatch to different `gptq_marlin` kernels #7323

LucasWilkinson · 2024-08-09T02:11:08Z

Use ScalarType instead of num_bits (in combination with has_zp) to preform the dispatching for gptq_marlin this sets the stage for folding fp8_marlin.cu into gptq_marlin.cu since now we can move dequant_8bit in fp8_marlin.cu as a dequant<T, vllm::kFE4M3fn.id()>(int q) specialization. I did not fold fp8_marlin.cu into gptq_marlin.cu in this PR to avoid excessive compile times for gptq_marlin.cu, but once #7317 is completed then this should be folded in.

In-order to support passing scalar type as a template parameter in C++17, it has to be serialized to something that can be passed as a template parameter. This per introduces the concept of serializing the type into a 64 bit int id (that can be passed as a parameter) alongside a deserialization routine (from_id if the template needs to access the traits of the type). If/when we move to make C++20 the lowest standard support this serialization/deserialization can be removed as C++20 introduces passing literal class types as template parameters (see: https://en.cppreference.com/w/cpp/language/template_parameters)

github-actions · 2024-08-09T02:11:21Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

csrc/quantization/gptq_marlin/gptq_marlin.cu

ProExpertProg

Looks good! A minor nit and potential bug

csrc/core/scalar_type.hpp

csrc/quantization/gptq_marlin/gptq_marlin.cu

csrc/core/scalar_type.hpp

bnellnm

LGTM

LucasWilkinson · 2024-08-12T15:56:43Z

/ready

…vllm-project#7323)

…vllm-project#7323) Signed-off-by: Alvant <[email protected]>

…vllm-project#7323)

LucasWilkinson mentioned this pull request Aug 9, 2024

[Feature]: Improve the compile times of gptq_marlin.cu #7317

Closed

LucasWilkinson force-pushed the lwilkinson/gptq-scalar-type-dispatch branch from 6e09e7e to 28798f9 Compare August 9, 2024 02:54

LucasWilkinson marked this pull request as ready for review August 9, 2024 03:34

LucasWilkinson mentioned this pull request Aug 9, 2024

[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel #7174

Merged

6 tasks

SageMoore reviewed Aug 9, 2024

View reviewed changes

csrc/quantization/gptq_marlin/gptq_marlin.cu Show resolved Hide resolved

ProExpertProg approved these changes Aug 9, 2024

View reviewed changes

csrc/core/scalar_type.hpp Outdated Show resolved Hide resolved

csrc/quantization/gptq_marlin/gptq_marlin.cu Outdated Show resolved Hide resolved

bnellnm reviewed Aug 9, 2024

View reviewed changes

csrc/core/scalar_type.hpp Outdated Show resolved Hide resolved

bnellnm reviewed Aug 9, 2024

View reviewed changes

csrc/core/scalar_type.hpp Outdated Show resolved Hide resolved

tlrmchlsmth reviewed Aug 9, 2024

View reviewed changes

csrc/core/scalar_type.hpp Outdated Show resolved Hide resolved

csrc/core/scalar_type.hpp Outdated Show resolved Hide resolved

bnellnm reviewed Aug 9, 2024

View reviewed changes

csrc/core/scalar_type.hpp Outdated Show resolved Hide resolved

tlrmchlsmth reviewed Aug 9, 2024

View reviewed changes

csrc/core/scalar_type.hpp Outdated Show resolved Hide resolved

csrc/core/scalar_type.hpp Outdated Show resolved Hide resolved

bnellnm approved these changes Aug 9, 2024

View reviewed changes

LucasWilkinson force-pushed the lwilkinson/gptq-scalar-type-dispatch branch from 5e15b86 to cc5247c Compare August 9, 2024 22:47

LucasWilkinson added 7 commits August 12, 2024 03:50

use scalar type to dispatch gptq_marlin

8572094

cleanup

e8e4200

review comments

91f2532

add comment

dfdd0a2

us sizeof

3db2305

review comments

9e45d13

format

68424c9

LucasWilkinson force-pushed the lwilkinson/gptq-scalar-type-dispatch branch from cc5247c to 68424c9 Compare August 12, 2024 03:50

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 12, 2024

LucasWilkinson changed the title ~~[Misc] Use scalar type to dispatch to diferent gptq_marlin kernels~~ [Misc] Use scalar type to dispatch to different gptq_marlin kernels Aug 12, 2024

tlrmchlsmth approved these changes Aug 12, 2024

View reviewed changes

mgoin approved these changes Aug 12, 2024

View reviewed changes

tlrmchlsmth merged commit 6aa33cb into vllm-project:main Aug 12, 2024
52 checks passed

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024

[Misc] Use scalar type to dispatch to different gptq_marlin kernels (…

6147972

…vllm-project#7323)

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024

[Misc] Use scalar type to dispatch to different gptq_marlin kernels (…

6da5885

…vllm-project#7323)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Misc] Use scalar type to dispatch to different gptq_marlin kernels (…

21266fd

…vllm-project#7323) Signed-off-by: Alvant <[email protected]>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[Misc] Use scalar type to dispatch to different gptq_marlin kernels (…

3fab360

…vllm-project#7323)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] Use scalar type to dispatch to different `gptq_marlin` kernels #7323

[Misc] Use scalar type to dispatch to different `gptq_marlin` kernels #7323

LucasWilkinson commented Aug 9, 2024 •

edited

Loading

github-actions bot commented Aug 9, 2024

ProExpertProg left a comment

bnellnm left a comment

LucasWilkinson commented Aug 12, 2024

[Misc] Use scalar type to dispatch to different gptq_marlin kernels #7323

[Misc] Use scalar type to dispatch to different gptq_marlin kernels #7323

Conversation

LucasWilkinson commented Aug 9, 2024 • edited Loading

github-actions bot commented Aug 9, 2024

ProExpertProg left a comment

Choose a reason for hiding this comment

bnellnm left a comment

Choose a reason for hiding this comment

LucasWilkinson commented Aug 12, 2024

[Misc] Use scalar type to dispatch to different `gptq_marlin` kernels #7323

[Misc] Use scalar type to dispatch to different `gptq_marlin` kernels #7323

LucasWilkinson commented Aug 9, 2024 •

edited

Loading