Run model as compressed/uncompressed mode #34719

horheynm · 2024-11-13T18:57:43Z

What does this PR do?

Loading quantized model using compressed-tensors is currently hardcoded to run in run_compressed mode.
This PR allows the model to be loaded in different ways

from transformers import AutoModelForCausalLM, AutoConfig
from transformers.utils.quantization_config import CompressedTensorsConfig

pretrained_model_name_or_path = "neuralmagic/Llama-3.2-11B-Vision-Instruct-FP8-dynamic" # static config file

quantization_config = CompressedTensorsConfig(run_compressed=False)
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path,
    quantization_config=quantization_config
)

Rocketknight1 · 2024-11-19T16:19:19Z

cc @SunMarc @MekkCyber for quantization

…magic/upstream-transformers into compressed-tensors/run_compressed

SunMarc

I see that the goal is to overwrite the run_compressed attribute in the quantization config. To do so, we have the merge_quantization_configs function and you mostly just need to create the get_loading_attributes function. I think this will make the user experience better also.

In the end, the user will only need to do:

quantization_config = CompressedTensorsConfig(run_compressed=False)
model.from_pretrained(...,quantization_config=quantization_config)

to load the uncompressed model

horheynm · 2024-11-22T18:13:49Z

PR is in a decent state to review. Will add tests for it to be finalized

…magic/upstream-transformers into compressed-tensors/run_compressed

horheynm · 2024-11-26T20:43:46Z

@SunMarc
Hey Marc, this PR is ready

src/transformers/modeling_utils.py

SunMarc

Thanks for the integration ! Left a few comments

src/transformers/quantizers/quantizer_compressed_tensors.py

src/transformers/utils/quantization_config.py

tests/quantization/compressed_tensor/test_run_compressed_model.py

src/transformers/quantizers/quantizer_compressed_tensors.py

dsikka

re: offline discussion
check if the warnings we're seeing on this branch are specific to an uncompressed model vs compressed model

src/transformers/quantizers/quantizer_compressed_tensors.py

…magic/upstream-transformers into compressed-tensors/run_compressed

horheynm · 2024-12-11T05:26:54Z

@ArthurZucker
Could I get a review please!

HuggingFaceDocBuilderDev · 2024-12-12T15:50:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Nice nice! Good addition, thanks 🤗

~~Contingent on merge of huggingface/transformers#34719 ~~^ has been merged not yet released~~ ^ has been released SUMMARY: Update test to use AutoModelForCausalLM decompressor instead of manually instantiating the compressor and decompressing. AutoModelForCausalLM will run code that if quantization_config is recognized, it will run the same decompression TEST PLAN: Ran the test using transformers main Must pass: tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py

~~Contingent on merge of huggingface/transformers#34719 ^ has been merged not yet released SUMMARY: Update run_compressed tests from decompression tests to run_comrpressed tests -> test if run_compressed True/False models generate the same output Add decompress tests that copies attrs from the source dir path's model to the target model. TEST PLAN: ran the test using transformers main must pass tests/llmcompressor/transformers/compression/test_decompress.py and tests/llmcompressor/transformers/compression/test_run_compressed.py

…d" (#1072) SUMMARY: Removed breakpoints and addressed comments for #970 TEST PLAN: Ran pytest for the two test files #970 ORIGINAL PR DESCRIPTION: ~~Contingent on merge of huggingface/transformers#34719 ^ has been merged not yet released SUMMARY: Update run_compressed tests from decompression tests to run_comrpressed tests -> test if run_compressed True/False models generate the same output Add decompress tests that copies attrs from the source dir path's model to the target model. TEST PLAN: ran the test using transformers main must pass tests/llmcompressor/transformers/compression/test_decompress.py and tests/llmcompressor/transformers/compression/test_run_compressed.py

~~Contingent on merge of huggingface/transformers#34719 ~~^ has been merged not yet released~~ ^ has been released SUMMARY: Update test to use AutoModelForCausalLM decompressor instead of manually instantiating the compressor and decompressing. AutoModelForCausalLM will run code that if quantization_config is recognized, it will run the same decompression TEST PLAN: Ran the test using transformers main Must pass: tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py Signed-off-by: Kyle Sayers <[email protected]>

~~Contingent on merge of huggingface/transformers#34719 ^ has been merged not yet released SUMMARY: Update run_compressed tests from decompression tests to run_comrpressed tests -> test if run_compressed True/False models generate the same output Add decompress tests that copies attrs from the source dir path's model to the target model. TEST PLAN: ran the test using transformers main must pass tests/llmcompressor/transformers/compression/test_decompress.py and tests/llmcompressor/transformers/compression/test_run_compressed.py Signed-off-by: Kyle Sayers <[email protected]>

~~Contingent on merge of huggingface/transformers#34719 ^ has been merged not yet released SUMMARY: Add test to * Given a model, oneshot quantize, then run ptq - training. Model must be run_compressed = False to run Note: * When running finetune on an already optimized (one-shotted) mode, the model needs to be decompressed explicitly using `CompressedTensorsConfig`. See https://github.com/vllm-project/llm-compressor/pull/964/files#diff-e480ed475c0a5b2beb4052c1dd2aca671999634ace41a5ea017fdff1ce68be0bR130-R135 * Tests using x2 H100s passed Also fix a bug where in log_sparsification, the layer name is not being recognized so fails. Here nothting is being sparsified, so num params is set to zero TEST PLAN: ran the test using transformers main must pass tests/llmcompressor/transformers/finetune/test_oneshot_then_finetune.py --------- Co-authored-by: Dipika Sikka <[email protected]>

~~Contingent on merge of huggingface/transformers#34719 ~~ ^ has been merged not yet released ~~ ^ has been released Blocked on neuralmagic/compressed-tensors#237 SUMMARY: * In multiple optimization tests, automatically decompress model if provided as optimized model * Fix recipe stage length * Revive old code * When running multiple optimizations (ex. oneshot then finetune, oneshot and oneshot), the recipes needs to be added to the session using `initialize_recipe`. Example here https://github.com/vllm-project/llm-compressor/pull/971/files#diff-c9ae8b3ad24d13abeea5b649a5fd6d0b0925f5c9cc40220cbfbe21ae81242f8dR63-R65 TEST PLAN: ran the test using transformers main Must pass tests/llmcompressor/transformers/obcq/test_consecutive_runs.py --------- Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Rahul Tuli <[email protected]>

~~Contingent on merge of huggingface/transformers#34719 ~~ ^ has been merged not yet released ~~ ^ has been released Blocked on neuralmagic/compressed-tensors#237 SUMMARY: * In multiple optimization tests, automatically decompress model if provided as optimized model * Fix recipe stage length * Revive old code * When running multiple optimizations (ex. oneshot then finetune, oneshot and oneshot), the recipes needs to be added to the session using `initialize_recipe`. Example here https://github.com/vllm-project/llm-compressor/pull/971/files#diff-c9ae8b3ad24d13abeea5b649a5fd6d0b0925f5c9cc40220cbfbe21ae81242f8dR63-R65 TEST PLAN: ran the test using transformers main Must pass tests/llmcompressor/transformers/obcq/test_consecutive_runs.py --------- Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Signed-off-by: Rahul Tuli <[email protected]>

horheynm added 4 commits November 13, 2024 18:56

draft, run model as compreszed/uncompressed mode

caa9d6b

draft

86a649d

run run_compressed=False

b28d1d2

Merge branch 'main' into compressed-tensors/run_compressed

39afd39

horheynm mentioned this pull request Nov 19, 2024

Actually make the run_compressed test useful vllm-project/llm-compressor#920

Merged

horheynm added 2 commits November 19, 2024 20:28

run_compressed as attr

bbe0b42

Merge branch 'compressed-tensors/run_compressed' of github.com:neural…

99d2d8a

…magic/upstream-transformers into compressed-tensors/run_compressed

SunMarc reviewed Nov 20, 2024

View reviewed changes

horheynm added 2 commits November 22, 2024 18:09

set run_compressed=False using quantization_config

5bd706b

remove redundant line

70aaee0

horheynm changed the title ~~draft, run model as compreszed/uncompressed mode~~ Run model as compressed/uncompressed mode Nov 22, 2024

horheynm added 2 commits November 22, 2024 18:18

make is_qat_trainable dependent on run_compressed status

32e693b

add tests

4f06a78

horheynm marked this pull request as ready for review November 25, 2024 18:40

horheynm added 5 commits November 25, 2024 13:40

Merge branch 'main' into compressed-tensors/run_compressed

edc6417

lint

668421b

Merge branch 'compressed-tensors/run_compressed' of github.com:neural…

d5a8940

…magic/upstream-transformers into compressed-tensors/run_compressed

full in docstring

d44e1c1

add decompress

42cf70d

horheynm commented Nov 26, 2024

View reviewed changes

src/transformers/modeling_utils.py Show resolved Hide resolved

SunMarc reviewed Nov 27, 2024

View reviewed changes

dsikka reviewed Nov 27, 2024

View reviewed changes

src/transformers/quantizers/quantizer_compressed_tensors.py Show resolved Hide resolved

horheynm added 3 commits December 2, 2024 14:54

comments

068944c

Merge branch 'main' into compressed-tensors/run_compressed

1cee2a2

decompress if model is compresssed and not run_compressed

0e6e339

dsikka reviewed Dec 2, 2024

View reviewed changes

Merge branch 'main' into compressed-tensors/run_compressed

131225b

add pahtway for decompressing sparse models

941af7e

dsikka reviewed Dec 10, 2024

View reviewed changes

src/transformers/quantizers/quantizer_compressed_tensors.py Show resolved Hide resolved

kylesayrs approved these changes Dec 10, 2024

View reviewed changes

typo on is_quantization_compressed

d3c418e

dsikka reviewed Dec 10, 2024

View reviewed changes

src/transformers/quantizers/quantizer_compressed_tensors.py Outdated Show resolved Hide resolved

horheynm added 3 commits December 10, 2024 23:08

Merge branch 'main' into compressed-tensors/run_compressed

3419e4c

lint

3ca6ade

Merge branch 'compressed-tensors/run_compressed' of github.com:neural…

2e7ef0a

…magic/upstream-transformers into compressed-tensors/run_compressed

horheynm added 3 commits December 11, 2024 10:51

fix typo

d1d28e7

Merge branch 'main' into compressed-tensors/run_compressed

6759933

Merge branch 'main' into compressed-tensors/run_compressed

9d2f2ec

Merge branch 'main' into compressed-tensors/run_compressed

c44c513

ArthurZucker approved these changes Dec 13, 2024

View reviewed changes

ArthurZucker merged commit e4e404f into huggingface:main Dec 13, 2024
22 checks passed

horheynm mentioned this pull request Jan 14, 2025

[Test Patch] Remove redundant code for "Fix/update test_run_compressed" vllm-project/llm-compressor#1072

Merged

tlrmchlsmth mentioned this pull request Jan 16, 2025

[Misc] Update to Transformers 4.48 vllm-project/vllm#12120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run model as compressed/uncompressed mode #34719

Run model as compressed/uncompressed mode #34719

horheynm commented Nov 13, 2024 •

edited

Loading

Rocketknight1 commented Nov 19, 2024

SunMarc left a comment •

edited

Loading

horheynm commented Nov 22, 2024

horheynm commented Nov 26, 2024

SunMarc left a comment

dsikka left a comment

horheynm commented Dec 11, 2024

HuggingFaceDocBuilderDev commented Dec 12, 2024

ArthurZucker left a comment

Run model as compressed/uncompressed mode #34719

Run model as compressed/uncompressed mode #34719

Conversation

horheynm commented Nov 13, 2024 • edited Loading

What does this PR do?

Rocketknight1 commented Nov 19, 2024

SunMarc left a comment • edited Loading

Choose a reason for hiding this comment

horheynm commented Nov 22, 2024

horheynm commented Nov 26, 2024

SunMarc left a comment

Choose a reason for hiding this comment

dsikka left a comment

Choose a reason for hiding this comment

horheynm commented Dec 11, 2024

HuggingFaceDocBuilderDev commented Dec 12, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

horheynm commented Nov 13, 2024 •

edited

Loading

SunMarc left a comment •

edited

Loading