Skip to content
This repository was archived by the owner on Jun 24, 2024. It is now read-only.

GPT-2 doesn't always have an lm_head #338

Closed
steventrouble opened this issue Jun 30, 2023 · 2 comments · Fixed by #362
Closed

GPT-2 doesn't always have an lm_head #338

steventrouble opened this issue Jun 30, 2023 · 2 comments · Fixed by #362
Labels
issue:bug Something isn't working model:gpt-2 GPT-2 model

Comments

@steventrouble
Copy link
Contributor

steventrouble commented Jun 30, 2023

The GPT2 ggml models don't always have an lm_head, but llm assumes it does:

let lm_head = tl.load("model/lm_head")?;

See ggml example repo where they mention lm_head is optional: https://github.com/ggerganov/ggml/blob/ee1b3727e60403012dd2b57d35b60558f4db66d8/examples/gpt-2/main.cpp#L364

@steventrouble
Copy link
Contributor Author

I tried replacing it with

let lm_head = tl.load("model/lm_head").or_else(|_| tl.load("model/wte"))?;

But then I ran into a more ggml errors and had to move on

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 252199424, available 250733056)

@LLukas22
Copy link
Contributor

LLukas22 commented Jul 1, 2023

Yeah i also noticed that, we probably have to share the wte tensor as the context size is calculated via the combined size of all tensors.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue:bug Something isn't working model:gpt-2 GPT-2 model
Projects
None yet
3 participants