fix llama lm_head load wrong weight error #4259

baodii · 2023-09-05T04:56:11Z

Hi,
I have found that when we do AutoTP shard loading, if there are more than one checkpoint file, we will get a wrong lm_head weight in llama model.

Because when we replace module we will use embedding_weight as lm_head.weight in advance. In other word, we use embedding_weight as lm_head.weight before we load lm_head.weight when we have more than one checkpoint file.

So I add is_last_cp flag to avoid this situation.

baodii · 2023-09-05T04:59:30Z

@delock @Yejing-Lai

dc3671 · 2023-09-05T08:11:16Z

I think it's kind of duplicated with #4206 . But I don't know which way is better.

delock · 2023-09-20T02:32:26Z

Hi @baodii @sywangyi can you converge these two PRs together? #4259 and #4206 ? These two PRs fix similiar issue. We need to have a single PR that fix all different cases.

baodii · 2023-09-20T05:01:19Z

same as #4206

fix llama lm_head load wrong weight error

1732b99

baodii requested review from RezaYazdaniAminabadi, jeffra, mrwyattii, awan-10, cmikeh2 and arashb as code owners September 5, 2023 04:56

This was referenced Sep 7, 2023

add meta onDevice support for LLAMA2 #4147

Merged

DeepSpeed-ZeRO v DeepSpeed-Inference #4234

Closed

baodii closed this Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix llama lm_head load wrong weight error #4259

fix llama lm_head load wrong weight error #4259

baodii commented Sep 5, 2023

baodii commented Sep 5, 2023

dc3671 commented Sep 5, 2023

delock commented Sep 20, 2023

baodii commented Sep 20, 2023

fix llama lm_head load wrong weight error #4259

fix llama lm_head load wrong weight error #4259

Conversation

baodii commented Sep 5, 2023

baodii commented Sep 5, 2023

dc3671 commented Sep 5, 2023

delock commented Sep 20, 2023

baodii commented Sep 20, 2023