What is the difference between two llama-2 models? #151

kjh2159 · 2024-09-25T01:29:33Z

Most of all, thank you for your team's awesome project!

I want to ask you one question.
I saw the mllmTeam's huggingface repository and there are a lot of model files.
By the way, for example, in the case of llama-2-7b-mllm, there are two versions of llama-2-7b-chat, named llama-2-7b-chat-q4_0_4_4.mllm and llama-2-7b-chat-q4_k.mllm.
(I saw the repository here: https://huggingface.co/mllmTeam/llama-2-7b-mllm/tree/main)
What are the differences between them and what do k and 0_4_4 stand for?

Thank you for your answer.

chenghuaWang · 2024-09-25T01:47:08Z

for Q4_0 and Q4_K pls check hugging face document: https://huggingface.co/docs/hub/en/gguf#quantization-types.
Q4_0_4_4 is a quantization type proposed by llama.cpp: ggerganov/llama.cpp#5780 (review). In simple terms, Q4_0_4_4 means tying together four Q4_0s, which is to say, storing together blocks of 4x32 elements.

kjh2159 · 2024-09-26T06:35:51Z

@chenghuaWang
Thank you for your answer. I've probed the explanation of the given links. Your insight was impressive. I appreciate you again.

kjh2159 closed this as completed Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the difference between two llama-2 models? #151

What is the difference between two llama-2 models? #151

kjh2159 commented Sep 25, 2024

chenghuaWang commented Sep 25, 2024

kjh2159 commented Sep 26, 2024

What is the difference between two llama-2 models? #151

What is the difference between two llama-2 models? #151

Comments

kjh2159 commented Sep 25, 2024

chenghuaWang commented Sep 25, 2024

kjh2159 commented Sep 26, 2024