[CLI] Use auto device map for model load #1596

lewtun · 2024-04-28T17:19:24Z

As a user of the chat CLI, I expect the model to be loaded on GPU by default instead of having to manually specify --device "cuda"

I believe this can be achieved by simply setting device_map="auto" which will also work on CPU if the user should require it.

lewtun · 2024-04-28T17:20:00Z

examples/scripts/chat.py

@@ -220,7 +220,7 @@ def load_model_and_tokenizer(args):
        trust_remote_code=args.trust_remote_code,
        attn_implementation=args.attn_implementation,
        torch_dtype=torch_dtype,
-        device_map=get_kbit_device_map() if quantization_config is not None else None,


I don't think the k-bit device map is actually needed for inference (one can shard and quantize AFAIK)

HuggingFaceDocBuilderDev · 2024-04-28T17:23:47Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

younesbelkada

sounds good, thanks !

Use auto device map

9eed9da

lewtun commented Apr 28, 2024

View reviewed changes

lewtun requested a review from younesbelkada April 28, 2024 17:20

younesbelkada approved these changes Apr 30, 2024

View reviewed changes

lewtun merged commit 5f09131 into main May 2, 2024
9 checks passed

lewtun deleted the chat-on-gpu branch May 2, 2024 07:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CLI] Use auto device map for model load #1596

[CLI] Use auto device map for model load #1596

lewtun commented Apr 28, 2024

lewtun Apr 28, 2024

HuggingFaceDocBuilderDev commented Apr 28, 2024

younesbelkada left a comment

[CLI] Use auto device map for model load #1596

[CLI] Use auto device map for model load #1596

Conversation

lewtun commented Apr 28, 2024

lewtun Apr 28, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 28, 2024

younesbelkada left a comment

Choose a reason for hiding this comment