Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I implement an easy-to-use package based on cuda branch #186

Closed
PanQiWei opened this issue Apr 16, 2023 · 3 comments
Closed

I implement an easy-to-use package based on cuda branch #186

PanQiWei opened this issue Apr 16, 2023 · 3 comments

Comments

@PanQiWei
Copy link

Hi @qwopqwop200 👋

First of all, I really appreciate all your efforts and contributes on this project, and show the potential to extend GPTQ algorithm to others model. I believe there are many people who want to try some LLMs yet short of hardware resources.

I recently implement (and will continue working on) an easy-to-use package named AutoGPTQ based on your cuda branch, theoretically it can easily extend to almost all CausalLMs in transformers with only four lines of code, and provide some user-friendly apis such as from_pretraind, from_quantized, save_quantized, quantize and generate to integrate with the model.

Next step I want to add triton support to the above package but I'm not sure if it's appropriate to start directly from your code in triton branch, but I'm really new one to triton, so it would be wonderful if I can get some help from you.

Best wishes! And thanks again for all you've done!

@qwopqwop200
Copy link
Owner

If you want to use triton, simply use this code.
https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/triton/quant/quant_linear.py
Additionally, the current make_quant has been changed to make_quant_linear
Also consider adding an autotune setting. This setting causes triton to be compiled first.
https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/triton/opt.py#L272
And triton supports backward, so it's generally better to recommend triton.

@Pathos14489
Copy link

Triton doesn't work on my M40, so I'm glad to see the cuda branch getting some love.

@clxyder
Copy link

clxyder commented Apr 18, 2023

Triton doesn't work on my M40, so I'm glad to see the cuda branch getting some love.

Sadly it doesn't work on my pascal card either.

Hopefully this can help: triton-lang/triton#1505

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants