[Bad Case]: 在LM Studio 用不了 #57

dr-data · 2024-02-07T11:16:03Z

Description / 描述

當於LM Studio載入"minicpm-2b-dpo-fp32.Q6_K.gguf" 時，報錯："create_tensor: tensor 'output.weight' not found"，不知道應該設定甚麼Preset

Case Explaination / 案例解释

No response

huangyuxiang03 · 2024-02-07T13:21:41Z

你好，请问能否提供更详细的复现说明？
Hi, could you provide how to reproduced this problem with more details?

sungkim11 · 2024-02-07T20:43:44Z

I got same error on LM Studio - "llama.cpp error: 'create_tensor: tensor 'output.weight' not found'"

Impacts all models.

dr-data · 2024-02-08T04:03:20Z

你好，请问能否提供更详细的复现说明？ Hi, could you provide how to reproduced this problem with more details?

Download the LM Studio: https://lmstudio.ai/
Search and Download the MiniCPM in LM Studio through the search bar in LM Studio.
Select the MiniCPM model to Load in LM Studio
Error came out.

Chaunice · 2024-02-08T13:01:38Z

Just found this PR merged into llama.cpp master.
However, using llama.cpp b2100. Got the same error.
Platform: Windows 11
Log file:

[1707396576] Log start
[1707396576] Cmd: D:\llama.cpp\main.exe -ngl 35 -m MiniCPM-2B-dpo.Q4_K_M.gguf --color -c 1024 --temp 0.3 --repeat_penalty 1.02 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{write a poem about love and death}\n\n### Response:"
[1707396576] main: build = 2098 (26d4efd1)
[1707396576] main: built with MSVC 19.37.32826.1 for x64
[1707396576] main: seed  = 1707396576
[1707396576] main: llama backend init
[1707396576] main: load the model and apply lora adapter, if any
[1707396576] llama_model_loader: loaded meta data with 22 key-value pairs and 362 tensors from MiniCPM-2B-dpo.Q4_K_M.gguf (version GGUF V3 (latest))
[1707396576] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[1707396576] llama_model_loader: - kv   0:                       general.architecture str              = llama
[1707396576] llama_model_loader: - kv   1:                               general.name str              = .
[1707396576] llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
[1707396576] llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2304
[1707396576] llama_model_loader: - kv   4:                          llama.block_count u32              = 40
[1707396576] llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5760
[1707396576] llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
[1707396576] llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 36
[1707396576] llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 36
[1707396576] llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
[1707396576] llama_model_loader: - kv  10:                          general.file_type u32              = 15
[1707396576] llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
[1707396576] llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,122753]  = ["<unk>", "<s>", "</s>", "<SEP>", "<C...
[1707396576] llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,122753]  = [0.000000, 0.000000, 0.000000, 0.0000...
[1707396576] llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,122753]  = [2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
[1707396576] llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
[1707396576] llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
[1707396576] llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
[1707396576] llama_model_loader: - kv  18:               tokenizer.ggml.add_bos_token bool             = true
[1707396576] llama_model_loader: - kv  19:               tokenizer.ggml.add_eos_token bool             = false
[1707396576] llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% for message in messages %}{% if me...
[1707396576] llama_model_loader: - kv  21:               general.quantization_version u32              = 2
[1707396576] llama_model_loader: - type  f32:   81 tensors
[1707396576] llama_model_loader: - type q5_0:   20 tensors
[1707396576] llama_model_loader: - type q8_0:   20 tensors
[1707396576] llama_model_loader: - type q4_K:  221 tensors
[1707396576] llama_model_loader: - type q6_K:   20 tensors
[1707396576] llm_load_vocab: mismatch in special tokens definition ( 3528/122753 vs 259/122753 ).
[1707396576] llm_load_print_meta: format           = GGUF V3 (latest)
[1707396576] llm_load_print_meta: arch             = llama
[1707396576] llm_load_print_meta: vocab type       = SPM
[1707396576] llm_load_print_meta: n_vocab          = 122753
[1707396576] llm_load_print_meta: n_merges         = 0
[1707396576] llm_load_print_meta: n_ctx_train      = 2048
[1707396576] llm_load_print_meta: n_embd           = 2304
[1707396576] llm_load_print_meta: n_head           = 36
[1707396576] llm_load_print_meta: n_head_kv        = 36
[1707396576] llm_load_print_meta: n_layer          = 40
[1707396576] llm_load_print_meta: n_rot            = 64
[1707396576] llm_load_print_meta: n_embd_head_k    = 64
[1707396576] llm_load_print_meta: n_embd_head_v    = 64
[1707396576] llm_load_print_meta: n_gqa            = 1
[1707396576] llm_load_print_meta: n_embd_k_gqa     = 2304
[1707396576] llm_load_print_meta: n_embd_v_gqa     = 2304
[1707396576] llm_load_print_meta: f_norm_eps       = 0.0e+00
[1707396576] llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
[1707396576] llm_load_print_meta: f_clamp_kqv      = 0.0e+00
[1707396576] llm_load_print_meta: f_max_alibi_bias = 0.0e+00
[1707396576] llm_load_print_meta: n_ff             = 5760
[1707396576] llm_load_print_meta: n_expert         = 0
[1707396576] llm_load_print_meta: n_expert_used    = 0
[1707396576] llm_load_print_meta: rope scaling     = linear
[1707396576] llm_load_print_meta: freq_base_train  = 10000.0
[1707396576] llm_load_print_meta: freq_scale_train = 1
[1707396576] llm_load_print_meta: n_yarn_orig_ctx  = 2048
[1707396576] llm_load_print_meta: rope_finetuned   = unknown
[1707396576] llm_load_print_meta: model type       = 13B
[1707396576] llm_load_print_meta: model ftype      = Q4_K - Medium
[1707396576] llm_load_print_meta: model params     = 2.72 B
[1707396576] llm_load_print_meta: model size       = 1.61 GiB (5.07 BPW) 
[1707396576] llm_load_print_meta: general.name     = .
[1707396576] llm_load_print_meta: BOS token        = 1 '<s>'
[1707396576] llm_load_print_meta: EOS token        = 2 '</s>'
[1707396576] llm_load_print_meta: UNK token        = 0 '<unk>'
[1707396576] llm_load_print_meta: LF token         = 1099 '<0x0A>'
[1707396576] llm_load_tensors: ggml ctx size =    0.28 MiB
[1707396576] llama_model_load: error loading model: create_tensor: tensor 'output.weight' not found
[1707396576] llama_load_model_from_file: failed to load model
[1707396576] main: error: unable to load model

Does that mean MiniCPM is not yet fully supported upon llama.cpp?

runfuture · 2024-02-08T14:50:32Z

Just found this PR merged into llama.cpp master. However, using llama.cpp b2100. Got the same error. Platform: Windows 11 Log file:

[1707396576] Log start
[1707396576] Cmd: D:\llama.cpp\main.exe -ngl 35 -m MiniCPM-2B-dpo.Q4_K_M.gguf --color -c 1024 --temp 0.3 --repeat_penalty 1.02 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{write a poem about love and death}\n\n### Response:"
[1707396576] main: build = 2098 (26d4efd1)
[1707396576] main: built with MSVC 19.37.32826.1 for x64
[1707396576] main: seed  = 1707396576
[1707396576] main: llama backend init
[1707396576] main: load the model and apply lora adapter, if any
[1707396576] llama_model_loader: loaded meta data with 22 key-value pairs and 362 tensors from MiniCPM-2B-dpo.Q4_K_M.gguf (version GGUF V3 (latest))
[1707396576] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[1707396576] llama_model_loader: - kv   0:                       general.architecture str              = llama
[1707396576] llama_model_loader: - kv   1:                               general.name str              = .
[1707396576] llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
[1707396576] llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2304
[1707396576] llama_model_loader: - kv   4:                          llama.block_count u32              = 40
[1707396576] llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5760
[1707396576] llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
[1707396576] llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 36
[1707396576] llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 36
[1707396576] llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
[1707396576] llama_model_loader: - kv  10:                          general.file_type u32              = 15
[1707396576] llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
[1707396576] llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,122753]  = ["<unk>", "<s>", "</s>", "<SEP>", "<C...
[1707396576] llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,122753]  = [0.000000, 0.000000, 0.000000, 0.0000...
[1707396576] llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,122753]  = [2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
[1707396576] llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
[1707396576] llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
[1707396576] llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
[1707396576] llama_model_loader: - kv  18:               tokenizer.ggml.add_bos_token bool             = true
[1707396576] llama_model_loader: - kv  19:               tokenizer.ggml.add_eos_token bool             = false
[1707396576] llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% for message in messages %}{% if me...
[1707396576] llama_model_loader: - kv  21:               general.quantization_version u32              = 2
[1707396576] llama_model_loader: - type  f32:   81 tensors
[1707396576] llama_model_loader: - type q5_0:   20 tensors
[1707396576] llama_model_loader: - type q8_0:   20 tensors
[1707396576] llama_model_loader: - type q4_K:  221 tensors
[1707396576] llama_model_loader: - type q6_K:   20 tensors
[1707396576] llm_load_vocab: mismatch in special tokens definition ( 3528/122753 vs 259/122753 ).
[1707396576] llm_load_print_meta: format           = GGUF V3 (latest)
[1707396576] llm_load_print_meta: arch             = llama
[1707396576] llm_load_print_meta: vocab type       = SPM
[1707396576] llm_load_print_meta: n_vocab          = 122753
[1707396576] llm_load_print_meta: n_merges         = 0
[1707396576] llm_load_print_meta: n_ctx_train      = 2048
[1707396576] llm_load_print_meta: n_embd           = 2304
[1707396576] llm_load_print_meta: n_head           = 36
[1707396576] llm_load_print_meta: n_head_kv        = 36
[1707396576] llm_load_print_meta: n_layer          = 40
[1707396576] llm_load_print_meta: n_rot            = 64
[1707396576] llm_load_print_meta: n_embd_head_k    = 64
[1707396576] llm_load_print_meta: n_embd_head_v    = 64
[1707396576] llm_load_print_meta: n_gqa            = 1
[1707396576] llm_load_print_meta: n_embd_k_gqa     = 2304
[1707396576] llm_load_print_meta: n_embd_v_gqa     = 2304
[1707396576] llm_load_print_meta: f_norm_eps       = 0.0e+00
[1707396576] llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
[1707396576] llm_load_print_meta: f_clamp_kqv      = 0.0e+00
[1707396576] llm_load_print_meta: f_max_alibi_bias = 0.0e+00
[1707396576] llm_load_print_meta: n_ff             = 5760
[1707396576] llm_load_print_meta: n_expert         = 0
[1707396576] llm_load_print_meta: n_expert_used    = 0
[1707396576] llm_load_print_meta: rope scaling     = linear
[1707396576] llm_load_print_meta: freq_base_train  = 10000.0
[1707396576] llm_load_print_meta: freq_scale_train = 1
[1707396576] llm_load_print_meta: n_yarn_orig_ctx  = 2048
[1707396576] llm_load_print_meta: rope_finetuned   = unknown
[1707396576] llm_load_print_meta: model type       = 13B
[1707396576] llm_load_print_meta: model ftype      = Q4_K - Medium
[1707396576] llm_load_print_meta: model params     = 2.72 B
[1707396576] llm_load_print_meta: model size       = 1.61 GiB (5.07 BPW) 
[1707396576] llm_load_print_meta: general.name     = .
[1707396576] llm_load_print_meta: BOS token        = 1 '<s>'
[1707396576] llm_load_print_meta: EOS token        = 2 '</s>'
[1707396576] llm_load_print_meta: UNK token        = 0 '<unk>'
[1707396576] llm_load_print_meta: LF token         = 1099 '<0x0A>'
[1707396576] llm_load_tensors: ggml ctx size =    0.28 MiB
[1707396576] llama_model_load: error loading model: create_tensor: tensor 'output.weight' not found
[1707396576] llama_load_model_from_file: failed to load model
[1707396576] main: error: unable to load model

Does that mean MiniCPM is not yet fully supported upon llama.cpp?

May I ask how you obtained the MiniCPM-2B-dpo.Q4_K_M.gguf? Could you please try converting it from the original huggingface model using the latest code from the llama.cpp master branch?

sweetcard · 2024-02-09T00:53:21Z

Maybe LM studio doesn't update to the latest version of Llama.cpp.
Be patient and wait for some time😄

runfuture · 2024-02-09T02:59:18Z

@Chaunice
For convenience, I have prepared a Colab notebook to convert the model to GGUF.
Additionally, I have provided the converted GGUF models in the links below:

sungkim11 · 2024-02-09T04:45:35Z

LM Studio is reporting - "llama.cpp error: 'unknown model architecture: 'minicpm'" with gguf you have provided.

Chaunice · 2024-02-09T06:24:19Z

@Chaunice For convenience, I have prepared a Colab notebook to convert the model to GGUF. Additionally, I have provided the converted GGUF models in the links below:

MiniCPM-2B-dpo-q4km-gguf

MiniCPM-2B-dpo-fp16-gguf

Hi, apologies for the delayed response.
I've just tested the first model you shared, and it's working perfectly!
I suppose the previous error might have been caused by an incompatible version of the llama.cpp, as the model I initially used is from lastrosade/MiniCPM-2B-dpo-f32-gguf.
Anyway, thx for your contribution. Very helpful!

dr-data · 2024-02-09T06:53:45Z

No. It doesn't work even after updating the LM Studio in the latest version.

The architecture of the model should be Llama rather minicpm. It generates error.

@Chaunice For convenience, I have prepared a Colab notebook to convert the model to GGUF. Additionally, I have provided the converted GGUF models in the links below:

MiniCPM-2B-dpo-q4km-gguf

MiniCPM-2B-dpo-fp16-gguf

Hi, apologies for the delayed response. I've just tested the first model you shared, and it's working perfectly! I suppose the previous error might have been caused by an incompatible version of the llama.cpp, as the model I initially used is from lastrosade/MiniCPM-2B-dpo-f32-gguf. Anyway, thx for your contribution. Very helpful!

jackylee1 · 2024-02-12T04:51:15Z

No. It doesn't work even after updating the LM Studio in the latest version.

The architecture of the model should be Llama rather minicpm. It generates error.

@Chaunice For convenience, I have prepared a Colab notebook to convert the model to GGUF. Additionally, I have provided the converted GGUF models in the links below:

MiniCPM-2B-dpo-q4km-gguf

MiniCPM-2B-dpo-fp16-gguf

Hi, apologies for the delayed response. I've just tested the first model you shared, and it's working perfectly! I suppose the previous error might have been caused by an incompatible version of the llama.cpp, as the model I initially used is from lastrosade/MiniCPM-2B-dpo-f32-gguf. Anyway, thx for your contribution. Very helpful!

same problem

cxzx150133 · 2024-02-12T06:05:29Z

I tried the latest version of llama.cpp and it can run normally.
I feel like I can only wait for updates. After all, LM Studio is not open source software.

cxzx150133 · 2024-02-23T06:57:57Z

After my testing, LM Studio 0.2.16 can already run the gguf version of MiniCPM normally.
Although Unsupported Architecture is displayed, it does not affect normal use.

dr-data added the badcase Bad cases label Feb 7, 2024

LDLINGLINGLING closed this as completed Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bad Case]: 在LM Studio 用不了 #57

[Bad Case]: 在LM Studio 用不了 #57

dr-data commented Feb 7, 2024

huangyuxiang03 commented Feb 7, 2024 •

edited

Loading

sungkim11 commented Feb 7, 2024 •

edited

Loading

dr-data commented Feb 8, 2024

Chaunice commented Feb 8, 2024

runfuture commented Feb 8, 2024 •

edited

Loading

sweetcard commented Feb 9, 2024

runfuture commented Feb 9, 2024 •

edited

Loading

sungkim11 commented Feb 9, 2024 •

edited

Loading

Chaunice commented Feb 9, 2024

dr-data commented Feb 9, 2024

jackylee1 commented Feb 12, 2024

cxzx150133 commented Feb 12, 2024

cxzx150133 commented Feb 23, 2024

[Bad Case]: 在LM Studio 用不了 #57

[Bad Case]: 在LM Studio 用不了 #57

Comments

dr-data commented Feb 7, 2024

Description / 描述

Case Explaination / 案例解释

huangyuxiang03 commented Feb 7, 2024 • edited Loading

sungkim11 commented Feb 7, 2024 • edited Loading

dr-data commented Feb 8, 2024

Chaunice commented Feb 8, 2024

runfuture commented Feb 8, 2024 • edited Loading

sweetcard commented Feb 9, 2024

runfuture commented Feb 9, 2024 • edited Loading

sungkim11 commented Feb 9, 2024 • edited Loading

Chaunice commented Feb 9, 2024

dr-data commented Feb 9, 2024

jackylee1 commented Feb 12, 2024

cxzx150133 commented Feb 12, 2024

cxzx150133 commented Feb 23, 2024

huangyuxiang03 commented Feb 7, 2024 •

edited

Loading

sungkim11 commented Feb 7, 2024 •

edited

Loading

runfuture commented Feb 8, 2024 •

edited

Loading

runfuture commented Feb 9, 2024 •

edited

Loading

sungkim11 commented Feb 9, 2024 •

edited

Loading