-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I implement an easy-to-use package based on cuda branch #186
Comments
If you want to use triton, simply use this code. |
Triton doesn't work on my M40, so I'm glad to see the cuda branch getting some love. |
Sadly it doesn't work on my pascal card either. Hopefully this can help: triton-lang/triton#1505 |
Hi @qwopqwop200 👋
First of all, I really appreciate all your efforts and contributes on this project, and show the potential to extend GPTQ algorithm to others model. I believe there are many people who want to try some LLMs yet short of hardware resources.
I recently implement (and will continue working on) an easy-to-use package named AutoGPTQ based on your cuda branch, theoretically it can easily extend to almost all CausalLMs in transformers with only four lines of code, and provide some user-friendly apis such as
from_pretraind
,from_quantized
,save_quantized
,quantize
andgenerate
to integrate with the model.Next step I want to add triton support to the above package but I'm not sure if it's appropriate to start directly from your code in triton branch, but I'm really new one to triton, so it would be wonderful if I can get some help from you.
Best wishes! And thanks again for all you've done!
The text was updated successfully, but these errors were encountered: