I implement an easy-to-use package based on cuda branch #186

PanQiWei · 2023-04-16T15:08:55Z

First of all, I really appreciate all your efforts and contributes on this project, and show the potential to extend GPTQ algorithm to others model. I believe there are many people who want to try some LLMs yet short of hardware resources.

I recently implement (and will continue working on) an easy-to-use package named AutoGPTQ based on your cuda branch, theoretically it can easily extend to almost all CausalLMs in transformers with only four lines of code, and provide some user-friendly apis such as from_pretraind, from_quantized, save_quantized, quantize and generate to integrate with the model.

Next step I want to add triton support to the above package but I'm not sure if it's appropriate to start directly from your code in triton branch, but I'm really new one to triton, so it would be wonderful if I can get some help from you.

Best wishes! And thanks again for all you've done!

The text was updated successfully, but these errors were encountered:

qwopqwop200 · 2023-04-16T20:25:17Z

If you want to use triton, simply use this code.
https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/triton/quant/quant_linear.py
Additionally, the current make_quant has been changed to make_quant_linear
Also consider adding an autotune setting. This setting causes triton to be compiled first.
https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/triton/opt.py#L272
And triton supports backward, so it's generally better to recommend triton.

Pathos14489 · 2023-04-17T03:38:06Z

Triton doesn't work on my M40, so I'm glad to see the cuda branch getting some love.

clxyder · 2023-04-18T02:24:42Z

Triton doesn't work on my M40, so I'm glad to see the cuda branch getting some love.

Sadly it doesn't work on my pascal card either.

Hopefully this can help: triton-lang/triton#1505

PanQiWei mentioned this issue Apr 17, 2023

This looks awesome! AutoGPTQ/AutoGPTQ#1

Closed

qwopqwop200 closed this as completed Apr 21, 2023

lee-b mentioned this issue May 13, 2023

fastest-inference-4bit fails to build #233

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I implement an easy-to-use package based on cuda branch #186

I implement an easy-to-use package based on cuda branch #186

PanQiWei commented Apr 16, 2023

qwopqwop200 commented Apr 16, 2023

Pathos14489 commented Apr 17, 2023

clxyder commented Apr 18, 2023

I implement an easy-to-use package based on cuda branch #186

I implement an easy-to-use package based on cuda branch #186

Comments

PanQiWei commented Apr 16, 2023

qwopqwop200 commented Apr 16, 2023

Pathos14489 commented Apr 17, 2023

clxyder commented Apr 18, 2023