[Feature]: Support Int8 dtype for storing weights - currently uses FP16 wasting 50% of VRAM #4031

cduk · 2024-04-12T07:56:27Z

🚀 The feature, motivation and pitch

Could you please add Int8 as a supported dtype? Currently when using Int8 models such as https://huggingface.co/Qwen/Qwen1.5-7B-Chat-GPTQ-Int8 with xformers instead of FlashAttention, the weights are stored as FP16 taking double the VRAM.

Alternatives

No response

Additional context

No response

hmellor · 2024-04-12T08:15:30Z

We already support GPTQ 8-bit #2330

cduk added the feature request label Apr 12, 2024

hmellor closed this as completed Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Support Int8 dtype for storing weights - currently uses FP16 wasting 50% of VRAM #4031

[Feature]: Support Int8 dtype for storing weights - currently uses FP16 wasting 50% of VRAM #4031

cduk commented Apr 12, 2024

hmellor commented Apr 12, 2024

[Feature]: Support Int8 dtype for storing weights - currently uses FP16 wasting 50% of VRAM #4031

[Feature]: Support Int8 dtype for storing weights - currently uses FP16 wasting 50% of VRAM #4031

Comments

cduk commented Apr 12, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

hmellor commented Apr 12, 2024