I Benchmarked 25 models at 16GB, 6.5GB, and 3.5GB sizes to find out whether a large model with smaller quant is better than a small model with bigger quant #11468
ZoontS
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
My Takeaway
Conclusion
You should use higher parameter count models IF you could fit anything better than the IQ3_XS quants. Q2 and Q1 quants are not worth it.
Personally, I would target IQ4_XS for GPU inference, and Q4_0 for CPU-only inference for the extra speed.
Beta Was this translation helpful? Give feedback.
All reactions