Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Int4-AWQ] Torch Int-4 AWQ Dequantization and Configuration Options #146

Merged
merged 1 commit into from
Aug 21, 2024

Conversation

hegemanjw4amd
Copy link

This PR creates a fully general Int4-AWQ dequantization function which uses torch and adds environment options (flags) for controlling torch-vs-triton codepaths.

Testing: Two HuggingFace models quantized in Int4-AWQ format have been successfully run:
Qwen2-7B-Instruct-AWQ (Latency benchmarking)
Phi-3-mini-4k-instruct-AWQ (Input verification)
For the latter model, specific input prompts were supplied and the output examined, in order to provide a sanity check for correctness.

Unit testing is accomplished via tests/kernels/test_awq_triton.py.

Resolves: https://github.com/ROCm/FasterTransformer-Internal/issues/287

@hegemanjw4amd hegemanjw4amd force-pushed the hegeman/basic-sdpa-attention-int4-awq-interim branch 3 times, most recently from 0b78568 to dd9a148 Compare August 21, 2024 10:47
Copy link
Collaborator

@shajrawi shajrawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ship it

@hegemanjw4amd hegemanjw4amd force-pushed the hegeman/basic-sdpa-attention-int4-awq-interim branch from dd9a148 to d4332ec Compare August 21, 2024 16:15
@hegemanjw4amd hegemanjw4amd merged commit 4e9830e into main Aug 21, 2024
13 checks passed
@gshtras gshtras deleted the hegeman/basic-sdpa-attention-int4-awq-interim branch September 10, 2024 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants