-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Use scalar type to dispatch to different gptq_marlin
kernels
#7323
[Misc] Use scalar type to dispatch to different gptq_marlin
kernels
#7323
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
6e09e7e
to
28798f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! A minor nit and potential bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
5e15b86
to
cc5247c
Compare
cc5247c
to
68424c9
Compare
/ready |
gptq_marlin
kernelsgptq_marlin
kernels
…vllm-project#7323) Signed-off-by: Alvant <[email protected]>
Use ScalarType instead of
num_bits
(in combination withhas_zp
) to preform the dispatching forgptq_marlin
this sets the stage for foldingfp8_marlin.cu
intogptq_marlin.cu
since now we can movedequant_8bit
infp8_marlin.cu
as adequant<T, vllm::kFE4M3fn.id()>(int q)
specialization. I did not foldfp8_marlin.cu
intogptq_marlin.cu
in this PR to avoid excessive compile times forgptq_marlin.cu
, but once #7317 is completed then this should be folded in.In-order to support passing scalar type as a template parameter in C++17, it has to be serialized to something that can be passed as a template parameter. This per introduces the concept of serializing the type into a 64 bit int
id
(that can be passed as a parameter) alongside a deserialization routine (from_id
if the template needs to access the traits of the type). If/when we move to make C++20 the lowest standard support this serialization/deserialization can be removed as C++20 introduces passing literal class types as template parameters (see: https://en.cppreference.com/w/cpp/language/template_parameters)