-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Added Command-R GPTQ support #3849
Conversation
Command R GPTQ support added.
Thanks! Could you post that model's link and put up a test result here? |
Would this work with the new Command R+ as well? Looks to be the same CohereForCasualLM architecture. |
Can we get this into the v4 as a slightly updated build |
I cloned the fork ran the model and got the same old error: |
Model link: https://huggingface.co/NEURALDEEPTECH/command-r-gptq Test code:
Output:
|
Try this model: https://huggingface.co/NEURALDEEPTECH/command-r-gptq |
Maybe))) You should try! |
For quantization I've used AutoGPTQ from this PR: AutoGPTQ/AutoGPTQ#631 This model: https://huggingface.co/NEURALDEEPTECH/command-r-gptq quantized not so good, because I've used only one primitive sample. Didn't had time to find good dataset and just wanted to check is my code for quantization and inference work. |
Thank you! I'll test later today. I was using the likely bunk Cyleux one. |
@egortolmachev Thanks for your testing! IIUC, for the error @osilverstein has presented, i think it might be related to AutoGPTQ/AutoGPTQ#601. It seems AutoGPTQ always generate if name.endswith(".bias") and name not in params_dict:
continue @osilverstein Could you test your model with adding above patch again? |
Tried the GPTQ Command R+ model (https://huggingface.co/alpindale/c4ai-command-r-plus-GPTQ) on my AMD system and similarly got |
I've tested neuraldeeptech's gptq command-r and it works. Slow on tp=4 though. |
I also tried it and had the same issue. |
Hey hey, thanks for the guidance, I'm just a bit slow. Do you if there is a way to modify the model itself without requantizing? Or through vllm? Which file to change? Many thanks |
Now works:
Output:
|
Cyleux model too:
Output:
|
@esmeetu > For quantization I've used AutoGPTQ from this PR: AutoGPTQ/AutoGPTQ#631
Would you be willing to run your ragas benchmark on the cyleux gptq model? It's quantized over a portion of OpenHermes 2.5 which chosen because I don't actually see evidence that command-r was trained on aya. |
Co-authored-by: Egor Tolmachev <[email protected]>
Co-authored-by: Egor Tolmachev <[email protected]>
need help, my friend, the results I obtained were inconsistent with yours: Prompt: 'Hello, my name is', Generated text: 'section哈哈哈哈哈哈哈哈哈哈哈哈哈哈哈' |
Tested Cyleux on my RAGAS benchmark. It's also poor, but I benchmarking only on Russian language. I have updated my model https://huggingface.co/NEURALDEEPTECH/command-r-gptq. Now works much better, but for second quantization used only russian samples from aya dataset. Planning to quantize on full aya dataset, but it's not in my first priority now) |
Can you show code, full output, model, versions of libs and so on? |
CODE:
OUTPUT:Processed prompts: 100%|██████████| 4/4 [00:01<00:00, 3.08it/s] ENV:NVIDIA-A100-80G and my CHANGES follow the code diff
here I use 'or', because 'and' have the same error : KeyError: 'model.layers.0.mlp.down_proj.bias'
|
@egortolmachev any idea on this ?
When run with AutoGPTQ: |
Co-authored-by: Egor Tolmachev <[email protected]>
Fixed Command-R GPTQ model loading by analogy with Gemma: #3553