`model.forward` requires `num_logits_to_keep`, not `logits_to_keep` #2770

richardwth · 2025-02-05T07:12:55Z

Reproduction

In _get_per_token_logps method of grpo_trainer.py, the model is called as

logits = model(
    input_ids=input_ids, attention_mask=attention_mask, logits_to_keep=logits_to_keep + 1
).logits  # (B, L, V)

But on Transformers 4.48, Qwen 2's model.forward has no logits_to_keep argument. It should be num_logits_to_keep, i.e.,

logits = model(
    input_ids=input_ids, attention_mask=attention_mask, num_logits_to_keep=num_logits_to_keep + 1
).logits  # (B, L, V)

System Info

Platform: Linux-5.10.134-16.3.al8.x86_64-x86_64-with-glibc2.35
Python version: 3.10.16
PyTorch version: 2.5.1
CUDA device(s): NVIDIA L20Z, NVIDIA L20Z, NVIDIA L20Z, NVIDIA L20Z, NVIDIA L20Z, NVIDIA L20Z, NVIDIA L20Z, NVIDIA L20Z
Transformers version: 4.48.0
Accelerate version: 1.2.1
Accelerate config: not found
Datasets version: 3.2.0
HF Hub version: 0.27.1
TRL version: 0.15.0.dev0
bitsandbytes version: 0.45.0
DeepSpeed version: 0.16.2
Diffusers version: not installed
Liger-Kernel version: not installed
LLM-Blender version: not installed
OpenAI version: 1.60.0
PEFT version: 0.6.2

Checklist

I have checked that my issue isn't already filed (see open issues)
I have included my system information
Any code provided is minimal, complete, and reproducible (more on MREs)
Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
Any traceback provided is complete

The text was updated successfully, but these errors were encountered:

qgallouedec · 2025-02-05T07:36:42Z

What model do you use?

richardwth · 2025-02-05T07:44:27Z

What model do you use?

I am using Qwen2.5 7B and Llama3 8B.

qgallouedec · 2025-02-05T09:40:08Z

Can't reproduce 🤔 (with both transformers 4.49 dev and 4.48):

>>> from transformers import AutoModelForCausalLM
>>> import torch
>>> model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B").to("cuda")
>>> input_ids = torch.randint(100, 200, (4, 256), device="cuda")
>>> model(input_ids, logits_to_keep=128)
CausalLMOutputWithPast(loss=None, logits=tensor([[[ 3.1937,  5.0300,  5.1271,  ...,  0.9952,  0.9950,  0.9953],
...
         [-5.3295, -7.6368, -1.7083,  ...,  4.3898,  4.3899,  4.3898]]],
       device='cuda:0', grad_fn=<UnsafeViewBackward0>), past_key_values=DynamicCache(), hidden_states=None, attentions=None)
>>>

Co1lin · 2025-02-05T10:46:16Z

@qgallouedec Hey, could you check the shape of the returned logits?

out = model(input_ids, logits_to_keep=128)
out.logits.shape

On my side, using logits_to_keep=128 will return torch.Size([4, 256, 152064]). Instead, using num_logits_to_keep=128 gives torch.Size([4, 128, 152064]).

Co1lin · 2025-02-05T10:48:37Z

@qgallouedec Also, do you think this is related to #2731?

qgallouedec · 2025-02-05T11:51:02Z

Nice finding. Indeed, with 4.48, you'll get 256, while with 4.49, you'll get 128, as expected.
You'll need to clear the cache as well after upgrading or load with force_download=True

richardwth · 2025-02-05T12:17:42Z

Right... Transformers' latest modeling_qwen2.py says:

@deprecate_kwarg("num_logits_to_keep", version="4.50", new_name="logits_to_keep")

So this is a bug related to some models of some Transformers versions. I've modified my issue to reflect this.

nopepper · 2025-02-05T18:24:40Z

!!! Thank you so much

I was baffled and debugging my code for hours because of this. After applying the fix, the difference is obvious:

Before:

After:

This is a pretty major bug that makes the method unusable for Qwen models (and maybe others) using the latest stable transformers version.

It may also be great to add some sort of check or assertion to make sure the shape of the logits matches what is expected.

Co1lin · 2025-02-05T18:49:33Z

@nopepper Hey, could you share the versions of transformers and trl that you are using now? Also, which model are you using?

Superskyyy · 2025-02-06T03:39:09Z

Interestingly even with this bug I was able to train a Qwen2.5 model properly and it actually converged, everything was normal... now my brain exploded. Checked the code and it was indeed as you guys found, 256 not 128

Update: Maybe because I was using a (without num_) commit from the prompt cache PR, which did no_grad on the prompt part. And offsets the issue. Only thing was the bug increased HBM usage.

github-actions bot added 🐛 bug Something isn't working 🏋 GRPO Related to GRPO labels Feb 5, 2025

qgallouedec linked a pull request Feb 6, 2025 that will close this issue

💡 GRPO vram-efficiency improvement; only compute relevant logprobs #2773

Merged

5 tasks

qgallouedec closed this as completed in #2773 Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`model.forward` requires `num_logits_to_keep`, not `logits_to_keep` #2770

`model.forward` requires `num_logits_to_keep`, not `logits_to_keep` #2770

richardwth commented Feb 5, 2025 •

edited

Loading

qgallouedec commented Feb 5, 2025

richardwth commented Feb 5, 2025

qgallouedec commented Feb 5, 2025

Co1lin commented Feb 5, 2025

Co1lin commented Feb 5, 2025

qgallouedec commented Feb 5, 2025

richardwth commented Feb 5, 2025

nopepper commented Feb 5, 2025 •

edited

Loading

Co1lin commented Feb 5, 2025

Superskyyy commented Feb 6, 2025 •

edited

Loading

model.forward requires num_logits_to_keep, not logits_to_keep #2770

model.forward requires num_logits_to_keep, not logits_to_keep #2770

Comments

richardwth commented Feb 5, 2025 • edited Loading

Reproduction

System Info

Checklist

qgallouedec commented Feb 5, 2025

richardwth commented Feb 5, 2025

qgallouedec commented Feb 5, 2025

Co1lin commented Feb 5, 2025

Co1lin commented Feb 5, 2025

qgallouedec commented Feb 5, 2025

richardwth commented Feb 5, 2025

nopepper commented Feb 5, 2025 • edited Loading

Co1lin commented Feb 5, 2025

Superskyyy commented Feb 6, 2025 • edited Loading

`model.forward` requires `num_logits_to_keep`, not `logits_to_keep` #2770

`model.forward` requires `num_logits_to_keep`, not `logits_to_keep` #2770

richardwth commented Feb 5, 2025 •

edited

Loading

nopepper commented Feb 5, 2025 •

edited

Loading

Superskyyy commented Feb 6, 2025 •

edited

Loading