🪆 Fix for Incorrect ValueError Handling in reward_weights in grpo_trainer.py #2843

loveychen · 2025-02-12T16:25:26Z

Summary

This pull request fixes a bug in the code where an extra len call in the ValueError message caused a TypeError to be thrown instead of the expected ValueError. The issue arises when checking the length of args.reward_weights against reward_funcs.

Problem Description

The original code contains an unnecessary nested len call:

len(len(args.reward_weights))

This results in a TypeError: object of type 'int' has no len(), because len(args.reward_weights) returns an integer, and integers cannot be passed to len().

Here (grpo_trainer.py#L278) is the problematic code:

if args.reward_weights is not None:
    if len(args.reward_weights) != len(reward_funcs):
        raise ValueError(
            f"Number of reward weights ({len(len(args.reward_weights))}) must match number of reward "
            f"functions ({len(reward_funcs)})"
        )
    self.reward_weights = torch.tensor(args.reward_weights, dtype=torch.float32)
else:
    self.reward_weights = torch.ones(len(reward_funcs), dtype=torch.float32)

Instead of the expected ValueError, a TypeError is raised, making it difficult to identify the actual problem in the code.

Proposed Fix

The fix is to remove the unnecessary len call in the error message. Specifically, replace:

len(len(args.reward_weights))

With:

len(args.reward_weights)

The corrected code is as follows:

if args.reward_weights is not None:
    if len(args.reward_weights) != len(reward_funcs):
        raise ValueError(
            f"Number of reward weights ({len(args.reward_weights)}) must match number of reward "
            f"functions ({len(reward_funcs)})"
        )
    self.reward_weights = torch.tensor(args.reward_weights, dtype=torch.float32)
else:
    self.reward_weights = torch.ones(len(reward_funcs), dtype=torch.float32)

Why This Fix Is Necessary

Prevents the unintended TypeError that occurs due to the extra len call.
Ensures proper error handling via the correct ValueError, which provides clarity on the mismatch between reward weights and reward functions.

Validation Steps

To ensure the fix resolves the issue, follow these steps:

Setup: Use an args.reward_weights that is not None and ensure its length does not match the reward_funcs length.
Expected Behavior Before Fix: A TypeError is raised at runtime.
Expected Behavior After Fix: The code correctly raises a ValueError with the following message:
```
ValueError: Number of reward weights (X) must match number of reward functions (Y)
```
Where X is the number of weights and Y is the number of functions.

- Fixed a bug where an extra `len` call inside the error message caused a `TypeError` instead of the expected `ValueError`. - Replaced `len(len(args.reward_weights))` with the correct `len(args.reward_weights)` to properly calculate the number of reward weights. - Ensured that a `ValueError` is now raised with an accurate and clear message when the number of reward weights does not match the number of reward functions. This fix prevents confusion during debugging and ensures proper error handling during validation. Tested with cases where: - `args.reward_weights` is None (default case). - `args.reward_weights` has mismatched lengths with `reward_funcs`.

qgallouedec

Thanks!

HuggingFaceDocBuilderDev · 2025-02-13T09:05:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

loveychen added 2 commits February 12, 2025 16:15

Merge branch 'main' into main

a579e03

qgallouedec approved these changes Feb 13, 2025

View reviewed changes

qgallouedec changed the title ~~Pull Request: Fix for Incorrect ValueError Handling in reward_weights in grpo_trainer.py~~ 🪆 Fix for Incorrect ValueError Handling in reward_weights in grpo_trainer.py Feb 13, 2025

qgallouedec merged commit 8830786 into huggingface:main Feb 13, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🪆 Fix for Incorrect ValueError Handling in reward_weights in grpo_trainer.py #2843

🪆 Fix for Incorrect ValueError Handling in reward_weights in grpo_trainer.py #2843

loveychen commented Feb 12, 2025

qgallouedec left a comment

HuggingFaceDocBuilderDev commented Feb 13, 2025

🪆 Fix for Incorrect ValueError Handling in reward_weights in grpo_trainer.py #2843

🪆 Fix for Incorrect ValueError Handling in reward_weights in grpo_trainer.py #2843

Conversation

loveychen commented Feb 12, 2025

Summary

Problem Description

Proposed Fix

Why This Fix Is Necessary

Validation Steps

qgallouedec left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 13, 2025