Turn off 2:4 sparse compression until supported in vllm (#1092)

This PR temporarily disables the newly added Sparse24 compression feature in example script, as support for this feature is not yet available in vLLM. Support for Sparse24 compression is being added in vLLM via [this PR](vllm-project/vllm#12097). Once that PR is merged, this change will be reverted to re-enable the feature. Signed-off-by: Rahul Tuli <[email protected]>
vllm-project · Jan 28, 2025 · 84899e6 · 84899e6
1 parent a82c9e7
commit 84899e6
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py b/examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py
@@ -116,5 +116,7 @@ def get_recipe(fp8_enabled):
 print("==========================================\n")
 
 # Save compressed model and tokenizer
-model.save_pretrained(save_dir, save_compressed=args.fp8)
+model.save_pretrained(
+    save_dir, save_compressed=args.fp8, disable_sparse_compression=True
+)
 tokenizer.save_pretrained(save_dir)