Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the difference between two llama-2 models? #151

Closed
kjh2159 opened this issue Sep 25, 2024 · 2 comments
Closed

What is the difference between two llama-2 models? #151

kjh2159 opened this issue Sep 25, 2024 · 2 comments

Comments

@kjh2159
Copy link

kjh2159 commented Sep 25, 2024

Most of all, thank you for your team's awesome project!

I want to ask you one question.
I saw the mllmTeam's huggingface repository and there are a lot of model files.
By the way, for example, in the case of llama-2-7b-mllm, there are two versions of llama-2-7b-chat, named llama-2-7b-chat-q4_0_4_4.mllm and llama-2-7b-chat-q4_k.mllm.
(I saw the repository here: https://huggingface.co/mllmTeam/llama-2-7b-mllm/tree/main)
What are the differences between them and what do k and 0_4_4 stand for?

Thank you for your answer.

@chenghuaWang
Copy link
Contributor

for Q4_0 and Q4_K pls check hugging face document: https://huggingface.co/docs/hub/en/gguf#quantization-types.
Q4_0_4_4 is a quantization type proposed by llama.cpp: ggerganov/llama.cpp#5780 (review). In simple terms, Q4_0_4_4 means tying together four Q4_0s, which is to say, storing together blocks of 4x32 elements.

@kjh2159
Copy link
Author

kjh2159 commented Sep 26, 2024

@chenghuaWang
Thank you for your answer. I've probed the explanation of the given links. Your insight was impressive. I appreciate you again.

@kjh2159 kjh2159 closed this as completed Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants