GPT-2 doesn't always have an `lm_head` #338

steventrouble · 2023-06-30T19:50:46Z

The GPT2 ggml models don't always have an lm_head, but llm assumes it does:

Line 62 in 9a22269

let lm_head = tl.load("model/lm_head")?;

The text was updated successfully, but these errors were encountered:

steventrouble · 2023-06-30T19:53:59Z

I tried replacing it with

let lm_head = tl.load("model/lm_head").or_else(|_| tl.load("model/wte"))?;

But then I ran into a more ggml errors and had to move on

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 252199424, available 250733056)

LLukas22 · 2023-07-01T05:59:45Z

Yeah i also noticed that, we probably have to share the wte tensor as the context size is calculated via the combined size of all tensors.

philpax added a commit that referenced this issue Jul 2, 2023

fix #338 - use wte if no lm_head for gpt2

453f36a

philpax mentioned this issue Jul 2, 2023

fix #338 - use wte if no lm_head for gpt2 #343

Closed

philpax added issue:bug Something isn't working model:gpt-2 GPT-2 model labels Jul 2, 2023

philpax mentioned this issue Jul 9, 2023

Update gpt2 to use wte if no lm_head #362

Merged

philpax closed this as completed in #362 Jul 11, 2023

Provide feedback