-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama : support InternLM2 #5184
Conversation
llama.cpp
Outdated
@@ -11220,7 +11409,8 @@ int32_t llama_tokenize( | |||
int32_t n_max_tokens, | |||
bool add_bos, | |||
bool special) { | |||
auto res = llama_tokenize_internal(model->vocab, std::string(text, text_len), add_bos, special); | |||
bool add_space_prefix = false ? model->arch == LLM_ARCH_INTERNLM2 : true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition is wrong:
bool add_space_prefix = false ? model->arch == LLM_ARCH_INTERNLM2 : true; | |
bool add_space_prefix = model->arch == LLM_ARCH_INTERNLM2 ? true : false; |
Is this change necessary? Seems strange to add prefix just for a specific model architecture. Can this be fixed outside of llama.cpp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or better,
bool add_space_prefix = model->arch == LLM_ARCH_INTERNLM2;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion! In terms of deciding the value for add_dummy_prefix
, I went through the SentencePiece code and found that the tokenizer's add_dummy_prefix
value is stored there. And we can get the value from tokenizer.model
through the python interface:
from sentencepiece import sentencepiece_model_pb2 as model
m = model.ModelProto()
m.ParseFromString(open(TOKENIZER_PATH, "rb").read())
print(m.normalizer_spec.add_dummy_prefix)
So, maybe when we're doing the checkpoint conversion in convert-hf-to-gguf.py, we could consider adding a new key-value field using gguf_writer. For example, we might use something like
gguf_writer.add_add_dummy_prefix(add_dummy_prefix: bool).
If you're good with it, I'll make a change so that we're not tying the decision on add_dummy_prefix
to any specific model type @ggerganov @cebtenzzre
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SolenoidWGT Yes, I think this is a good solution. Please make the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SolenoidWGT Yes, I think this is a good solution. Please make the change
Done.
Do you intend to support Plora_A and B tensors ? I've experimented with those a bit but my pytorch experience is low, so it's a pain applying it into ggml. constants.py
constants.py
constants.py
tensor_mapping.py:
in convert_hf_to_ggml.py: (skipping qkv reshape)
That makes your lora models convertable, quantization works (though I guess that the LORA should get a higher quantization quality) Now what would be missing is the conditional support to apply it ? |
I will try to support lora for internLM2, but I am not very familiar with the lora implementation of llama.cpp, and I need some time to read the code. |
I've put this PR together with xcomposer2 support here: #5232 Just fyi, if you find time to add a hand to get the plora working we would have full support for xcomposer2 - possibly the best visual multimodal model to date. |
e0375e1
to
d25bc84
Compare
f7fb7be
to
47824a5
Compare
llama.cpp
Outdated
{ LLM_TENSOR_FFN_NORM, "blk.%d.ffn_norm" }, | ||
{ LLM_TENSOR_FFN_DOWN, "blk.%d.ffn_down" }, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of Orion's tensors have been deleted - LLM_TENSOR_FFN_GATE
and LLM_TENSOR_FFN_UP
. Please restore:
Lines 654 to 670 in 1cfb537
LLM_ARCH_ORION, | |
{ | |
{ LLM_TENSOR_TOKEN_EMBD, "token_embd" }, | |
{ LLM_TENSOR_OUTPUT_NORM, "output_norm" }, | |
{ LLM_TENSOR_OUTPUT, "output" }, | |
{ LLM_TENSOR_ROPE_FREQS, "rope_freqs" }, | |
{ LLM_TENSOR_ATTN_NORM, "blk.%d.attn_norm" }, | |
{ LLM_TENSOR_ATTN_Q, "blk.%d.attn_q" }, | |
{ LLM_TENSOR_ATTN_K, "blk.%d.attn_k" }, | |
{ LLM_TENSOR_ATTN_V, "blk.%d.attn_v" }, | |
{ LLM_TENSOR_ATTN_OUT, "blk.%d.attn_output" }, | |
{ LLM_TENSOR_ATTN_ROT_EMBD, "blk.%d.attn_rot_embd" }, | |
{ LLM_TENSOR_FFN_NORM, "blk.%d.ffn_norm" }, | |
{ LLM_TENSOR_FFN_GATE, "blk.%d.ffn_gate" }, | |
{ LLM_TENSOR_FFN_DOWN, "blk.%d.ffn_down" }, | |
{ LLM_TENSOR_FFN_UP, "blk.%d.ffn_up" }, | |
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was careless when rabase, it has been fixed.
* support InternLM2 inference * add add_space_prefix KV pair
47824a5
to
ca13503
Compare
I'm having the same issue as @sweetcard but on Edit: |
* support InternLM2 inference * add add_space_prefix KV pair
@arch-btw @sweetcard , I spent some time debugging to find the problem, and fixed it in this PR #5305. |
thank you for your excellent wok👍 |
Thank you @SolenoidWGT ! |
* support InternLM2 inference * add add_space_prefix KV pair
Dear llama.cpp developer @ggerganov:
Hello, this PR is to support the InternLM2. InternLM2 is a GQA-based transform architecture and offers effective support for long contexts of up to 200,000 characters. Now more and more users want to explore the capabilities of InternLM2 using their familiar llama.cpp API. We sincerely hope that our PR can be accepted. Thank you.
For information about InternLM2, please visit https://github.com/InternLM/InternLM,
related issue: #3133, #3551, #4360, #5031
cc: @yhcc, @vansinhu, @sunpengsdu