Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bad Case]: 在LM Studio 用不了 #57

Closed
dr-data opened this issue Feb 7, 2024 · 13 comments
Closed

[Bad Case]: 在LM Studio 用不了 #57

dr-data opened this issue Feb 7, 2024 · 13 comments
Labels
badcase Bad cases

Comments

@dr-data
Copy link

dr-data commented Feb 7, 2024

Description / 描述

當於LM Studio載入"minicpm-2b-dpo-fp32.Q6_K.gguf" 時,報錯:"create_tensor: tensor 'output.weight' not found",不知道應該設定甚麼Preset

Case Explaination / 案例解释

No response

@dr-data dr-data added the badcase Bad cases label Feb 7, 2024
@huangyuxiang03
Copy link
Collaborator

huangyuxiang03 commented Feb 7, 2024

你好,请问能否提供更详细的复现说明?
Hi, could you provide how to reproduced this problem with more details?

@sungkim11
Copy link

sungkim11 commented Feb 7, 2024

I got same error on LM Studio - "llama.cpp error: 'create_tensor: tensor 'output.weight' not found'"

Impacts all models.

@dr-data
Copy link
Author

dr-data commented Feb 8, 2024

你好,请问能否提供更详细的复现说明? Hi, could you provide how to reproduced this problem with more details?

  • Download the LM Studio: https://lmstudio.ai/
  • Search and Download the MiniCPM in LM Studio through the search bar in LM Studio.
  • Select the MiniCPM model to Load in LM Studio
  • Error came out.
Screenshot 2024-02-08 at 12 00 55

@Chaunice
Copy link

Chaunice commented Feb 8, 2024

Just found this PR merged into llama.cpp master.
However, using llama.cpp b2100. Got the same error.
Platform: Windows 11
Log file:

[1707396576] Log start
[1707396576] Cmd: D:\llama.cpp\main.exe -ngl 35 -m MiniCPM-2B-dpo.Q4_K_M.gguf --color -c 1024 --temp 0.3 --repeat_penalty 1.02 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{write a poem about love and death}\n\n### Response:"
[1707396576] main: build = 2098 (26d4efd1)
[1707396576] main: built with MSVC 19.37.32826.1 for x64
[1707396576] main: seed  = 1707396576
[1707396576] main: llama backend init
[1707396576] main: load the model and apply lora adapter, if any
[1707396576] llama_model_loader: loaded meta data with 22 key-value pairs and 362 tensors from MiniCPM-2B-dpo.Q4_K_M.gguf (version GGUF V3 (latest))
[1707396576] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[1707396576] llama_model_loader: - kv   0:                       general.architecture str              = llama
[1707396576] llama_model_loader: - kv   1:                               general.name str              = .
[1707396576] llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
[1707396576] llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2304
[1707396576] llama_model_loader: - kv   4:                          llama.block_count u32              = 40
[1707396576] llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5760
[1707396576] llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
[1707396576] llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 36
[1707396576] llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 36
[1707396576] llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
[1707396576] llama_model_loader: - kv  10:                          general.file_type u32              = 15
[1707396576] llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
[1707396576] llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,122753]  = ["<unk>", "<s>", "</s>", "<SEP>", "<C...
[1707396576] llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,122753]  = [0.000000, 0.000000, 0.000000, 0.0000...
[1707396576] llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,122753]  = [2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
[1707396576] llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
[1707396576] llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
[1707396576] llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
[1707396576] llama_model_loader: - kv  18:               tokenizer.ggml.add_bos_token bool             = true
[1707396576] llama_model_loader: - kv  19:               tokenizer.ggml.add_eos_token bool             = false
[1707396576] llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% for message in messages %}{% if me...
[1707396576] llama_model_loader: - kv  21:               general.quantization_version u32              = 2
[1707396576] llama_model_loader: - type  f32:   81 tensors
[1707396576] llama_model_loader: - type q5_0:   20 tensors
[1707396576] llama_model_loader: - type q8_0:   20 tensors
[1707396576] llama_model_loader: - type q4_K:  221 tensors
[1707396576] llama_model_loader: - type q6_K:   20 tensors
[1707396576] llm_load_vocab: mismatch in special tokens definition ( 3528/122753 vs 259/122753 ).
[1707396576] llm_load_print_meta: format           = GGUF V3 (latest)
[1707396576] llm_load_print_meta: arch             = llama
[1707396576] llm_load_print_meta: vocab type       = SPM
[1707396576] llm_load_print_meta: n_vocab          = 122753
[1707396576] llm_load_print_meta: n_merges         = 0
[1707396576] llm_load_print_meta: n_ctx_train      = 2048
[1707396576] llm_load_print_meta: n_embd           = 2304
[1707396576] llm_load_print_meta: n_head           = 36
[1707396576] llm_load_print_meta: n_head_kv        = 36
[1707396576] llm_load_print_meta: n_layer          = 40
[1707396576] llm_load_print_meta: n_rot            = 64
[1707396576] llm_load_print_meta: n_embd_head_k    = 64
[1707396576] llm_load_print_meta: n_embd_head_v    = 64
[1707396576] llm_load_print_meta: n_gqa            = 1
[1707396576] llm_load_print_meta: n_embd_k_gqa     = 2304
[1707396576] llm_load_print_meta: n_embd_v_gqa     = 2304
[1707396576] llm_load_print_meta: f_norm_eps       = 0.0e+00
[1707396576] llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
[1707396576] llm_load_print_meta: f_clamp_kqv      = 0.0e+00
[1707396576] llm_load_print_meta: f_max_alibi_bias = 0.0e+00
[1707396576] llm_load_print_meta: n_ff             = 5760
[1707396576] llm_load_print_meta: n_expert         = 0
[1707396576] llm_load_print_meta: n_expert_used    = 0
[1707396576] llm_load_print_meta: rope scaling     = linear
[1707396576] llm_load_print_meta: freq_base_train  = 10000.0
[1707396576] llm_load_print_meta: freq_scale_train = 1
[1707396576] llm_load_print_meta: n_yarn_orig_ctx  = 2048
[1707396576] llm_load_print_meta: rope_finetuned   = unknown
[1707396576] llm_load_print_meta: model type       = 13B
[1707396576] llm_load_print_meta: model ftype      = Q4_K - Medium
[1707396576] llm_load_print_meta: model params     = 2.72 B
[1707396576] llm_load_print_meta: model size       = 1.61 GiB (5.07 BPW) 
[1707396576] llm_load_print_meta: general.name     = .
[1707396576] llm_load_print_meta: BOS token        = 1 '<s>'
[1707396576] llm_load_print_meta: EOS token        = 2 '</s>'
[1707396576] llm_load_print_meta: UNK token        = 0 '<unk>'
[1707396576] llm_load_print_meta: LF token         = 1099 '<0x0A>'
[1707396576] llm_load_tensors: ggml ctx size =    0.28 MiB
[1707396576] llama_model_load: error loading model: create_tensor: tensor 'output.weight' not found
[1707396576] llama_load_model_from_file: failed to load model
[1707396576] main: error: unable to load model

Does that mean MiniCPM is not yet fully supported upon llama.cpp?

@runfuture
Copy link

runfuture commented Feb 8, 2024

Just found this PR merged into llama.cpp master. However, using llama.cpp b2100. Got the same error. Platform: Windows 11 Log file:

[1707396576] Log start
[1707396576] Cmd: D:\llama.cpp\main.exe -ngl 35 -m MiniCPM-2B-dpo.Q4_K_M.gguf --color -c 1024 --temp 0.3 --repeat_penalty 1.02 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{write a poem about love and death}\n\n### Response:"
[1707396576] main: build = 2098 (26d4efd1)
[1707396576] main: built with MSVC 19.37.32826.1 for x64
[1707396576] main: seed  = 1707396576
[1707396576] main: llama backend init
[1707396576] main: load the model and apply lora adapter, if any
[1707396576] llama_model_loader: loaded meta data with 22 key-value pairs and 362 tensors from MiniCPM-2B-dpo.Q4_K_M.gguf (version GGUF V3 (latest))
[1707396576] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[1707396576] llama_model_loader: - kv   0:                       general.architecture str              = llama
[1707396576] llama_model_loader: - kv   1:                               general.name str              = .
[1707396576] llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
[1707396576] llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2304
[1707396576] llama_model_loader: - kv   4:                          llama.block_count u32              = 40
[1707396576] llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5760
[1707396576] llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
[1707396576] llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 36
[1707396576] llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 36
[1707396576] llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
[1707396576] llama_model_loader: - kv  10:                          general.file_type u32              = 15
[1707396576] llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
[1707396576] llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,122753]  = ["<unk>", "<s>", "</s>", "<SEP>", "<C...
[1707396576] llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,122753]  = [0.000000, 0.000000, 0.000000, 0.0000...
[1707396576] llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,122753]  = [2, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
[1707396576] llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
[1707396576] llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
[1707396576] llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
[1707396576] llama_model_loader: - kv  18:               tokenizer.ggml.add_bos_token bool             = true
[1707396576] llama_model_loader: - kv  19:               tokenizer.ggml.add_eos_token bool             = false
[1707396576] llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% for message in messages %}{% if me...
[1707396576] llama_model_loader: - kv  21:               general.quantization_version u32              = 2
[1707396576] llama_model_loader: - type  f32:   81 tensors
[1707396576] llama_model_loader: - type q5_0:   20 tensors
[1707396576] llama_model_loader: - type q8_0:   20 tensors
[1707396576] llama_model_loader: - type q4_K:  221 tensors
[1707396576] llama_model_loader: - type q6_K:   20 tensors
[1707396576] llm_load_vocab: mismatch in special tokens definition ( 3528/122753 vs 259/122753 ).
[1707396576] llm_load_print_meta: format           = GGUF V3 (latest)
[1707396576] llm_load_print_meta: arch             = llama
[1707396576] llm_load_print_meta: vocab type       = SPM
[1707396576] llm_load_print_meta: n_vocab          = 122753
[1707396576] llm_load_print_meta: n_merges         = 0
[1707396576] llm_load_print_meta: n_ctx_train      = 2048
[1707396576] llm_load_print_meta: n_embd           = 2304
[1707396576] llm_load_print_meta: n_head           = 36
[1707396576] llm_load_print_meta: n_head_kv        = 36
[1707396576] llm_load_print_meta: n_layer          = 40
[1707396576] llm_load_print_meta: n_rot            = 64
[1707396576] llm_load_print_meta: n_embd_head_k    = 64
[1707396576] llm_load_print_meta: n_embd_head_v    = 64
[1707396576] llm_load_print_meta: n_gqa            = 1
[1707396576] llm_load_print_meta: n_embd_k_gqa     = 2304
[1707396576] llm_load_print_meta: n_embd_v_gqa     = 2304
[1707396576] llm_load_print_meta: f_norm_eps       = 0.0e+00
[1707396576] llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
[1707396576] llm_load_print_meta: f_clamp_kqv      = 0.0e+00
[1707396576] llm_load_print_meta: f_max_alibi_bias = 0.0e+00
[1707396576] llm_load_print_meta: n_ff             = 5760
[1707396576] llm_load_print_meta: n_expert         = 0
[1707396576] llm_load_print_meta: n_expert_used    = 0
[1707396576] llm_load_print_meta: rope scaling     = linear
[1707396576] llm_load_print_meta: freq_base_train  = 10000.0
[1707396576] llm_load_print_meta: freq_scale_train = 1
[1707396576] llm_load_print_meta: n_yarn_orig_ctx  = 2048
[1707396576] llm_load_print_meta: rope_finetuned   = unknown
[1707396576] llm_load_print_meta: model type       = 13B
[1707396576] llm_load_print_meta: model ftype      = Q4_K - Medium
[1707396576] llm_load_print_meta: model params     = 2.72 B
[1707396576] llm_load_print_meta: model size       = 1.61 GiB (5.07 BPW) 
[1707396576] llm_load_print_meta: general.name     = .
[1707396576] llm_load_print_meta: BOS token        = 1 '<s>'
[1707396576] llm_load_print_meta: EOS token        = 2 '</s>'
[1707396576] llm_load_print_meta: UNK token        = 0 '<unk>'
[1707396576] llm_load_print_meta: LF token         = 1099 '<0x0A>'
[1707396576] llm_load_tensors: ggml ctx size =    0.28 MiB
[1707396576] llama_model_load: error loading model: create_tensor: tensor 'output.weight' not found
[1707396576] llama_load_model_from_file: failed to load model
[1707396576] main: error: unable to load model

Does that mean MiniCPM is not yet fully supported upon llama.cpp?

May I ask how you obtained the MiniCPM-2B-dpo.Q4_K_M.gguf? Could you please try converting it from the original huggingface model using the latest code from the llama.cpp master branch?

@sweetcard
Copy link

Maybe LM studio doesn't update to the latest version of Llama.cpp.
Be patient and wait for some time😄

@runfuture
Copy link

runfuture commented Feb 9, 2024

@Chaunice
For convenience, I have prepared a Colab notebook to convert the model to GGUF.
Additionally, I have provided the converted GGUF models in the links below:

  1. MiniCPM-2B-dpo-q4km-gguf
  2. MiniCPM-2B-dpo-fp16-gguf

@sungkim11
Copy link

sungkim11 commented Feb 9, 2024

LM Studio is reporting - "llama.cpp error: 'unknown model architecture: 'minicpm'" with gguf you have provided.

@Chaunice
Copy link

Chaunice commented Feb 9, 2024

@Chaunice For convenience, I have prepared a Colab notebook to convert the model to GGUF. Additionally, I have provided the converted GGUF models in the links below:

  1. MiniCPM-2B-dpo-q4km-gguf
  2. MiniCPM-2B-dpo-fp16-gguf

Hi, apologies for the delayed response.
I've just tested the first model you shared, and it's working perfectly!
I suppose the previous error might have been caused by an incompatible version of the llama.cpp, as the model I initially used is from lastrosade/MiniCPM-2B-dpo-f32-gguf.
Anyway, thx for your contribution. Very helpful!

@dr-data
Copy link
Author

dr-data commented Feb 9, 2024

No. It doesn't work even after updating the LM Studio in the latest version.

The architecture of the model should be Llama rather minicpm. It generates error.

@Chaunice For convenience, I have prepared a Colab notebook to convert the model to GGUF. Additionally, I have provided the converted GGUF models in the links below:

  1. MiniCPM-2B-dpo-q4km-gguf
  2. MiniCPM-2B-dpo-fp16-gguf

Hi, apologies for the delayed response. I've just tested the first model you shared, and it's working perfectly! I suppose the previous error might have been caused by an incompatible version of the llama.cpp, as the model I initially used is from lastrosade/MiniCPM-2B-dpo-f32-gguf. Anyway, thx for your contribution. Very helpful!

@jackylee1
Copy link

No. It doesn't work even after updating the LM Studio in the latest version.

The architecture of the model should be Llama rather minicpm. It generates error.

@Chaunice For convenience, I have prepared a Colab notebook to convert the model to GGUF. Additionally, I have provided the converted GGUF models in the links below:

  1. MiniCPM-2B-dpo-q4km-gguf
  2. MiniCPM-2B-dpo-fp16-gguf

Hi, apologies for the delayed response. I've just tested the first model you shared, and it's working perfectly! I suppose the previous error might have been caused by an incompatible version of the llama.cpp, as the model I initially used is from lastrosade/MiniCPM-2B-dpo-f32-gguf. Anyway, thx for your contribution. Very helpful!

same problem

@cxzx150133
Copy link

I tried the latest version of llama.cpp and it can run normally.
I feel like I can only wait for updates. After all, LM Studio is not open source software.

@cxzx150133
Copy link

After my testing, LM Studio 0.2.16 can already run the gguf version of MiniCPM normally.
Although Unsupported Architecture is displayed, it does not affect normal use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
badcase Bad cases
Projects
None yet
Development

No branches or pull requests

9 participants