Skip to content

Commit

Permalink
Turn off 2:4 sparse compression until supported in vllm (#1092)
Browse files Browse the repository at this point in the history
This PR temporarily disables the newly added Sparse24 compression
feature in example script, as support for this feature is not yet
available in vLLM.

Support for Sparse24 compression is being added in vLLM via [this
PR](vllm-project/vllm#12097). Once that PR is
merged, this change will be reverted to re-enable the feature.

Signed-off-by: Rahul Tuli <[email protected]>
  • Loading branch information
rahul-tuli committed Jan 28, 2025
1 parent a82c9e7 commit 84899e6
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,5 +116,7 @@ def get_recipe(fp8_enabled):
print("==========================================\n")

# Save compressed model and tokenizer
model.save_pretrained(save_dir, save_compressed=args.fp8)
model.save_pretrained(
save_dir, save_compressed=args.fp8, disable_sparse_compression=True
)
tokenizer.save_pretrained(save_dir)

0 comments on commit 84899e6

Please sign in to comment.