Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🪆 Fix for Incorrect ValueError Handling in reward_weights in grpo_trainer.py #2843

Merged
merged 2 commits into from
Feb 13, 2025

Conversation

loveychen
Copy link
Contributor

Summary

This pull request fixes a bug in the code where an extra len call in the ValueError message caused a TypeError to be thrown instead of the expected ValueError. The issue arises when checking the length of args.reward_weights against reward_funcs.


Problem Description

The original code contains an unnecessary nested len call:

len(len(args.reward_weights))

This results in a TypeError: object of type 'int' has no len(), because len(args.reward_weights) returns an integer, and integers cannot be passed to len().

Here (grpo_trainer.py#L278) is the problematic code:

if args.reward_weights is not None:
    if len(args.reward_weights) != len(reward_funcs):
        raise ValueError(
            f"Number of reward weights ({len(len(args.reward_weights))}) must match number of reward "
            f"functions ({len(reward_funcs)})"
        )
    self.reward_weights = torch.tensor(args.reward_weights, dtype=torch.float32)
else:
    self.reward_weights = torch.ones(len(reward_funcs), dtype=torch.float32)

Instead of the expected ValueError, a TypeError is raised, making it difficult to identify the actual problem in the code.


Proposed Fix

The fix is to remove the unnecessary len call in the error message. Specifically, replace:

len(len(args.reward_weights))

With:

len(args.reward_weights)

The corrected code is as follows:

if args.reward_weights is not None:
    if len(args.reward_weights) != len(reward_funcs):
        raise ValueError(
            f"Number of reward weights ({len(args.reward_weights)}) must match number of reward "
            f"functions ({len(reward_funcs)})"
        )
    self.reward_weights = torch.tensor(args.reward_weights, dtype=torch.float32)
else:
    self.reward_weights = torch.ones(len(reward_funcs), dtype=torch.float32)

Why This Fix Is Necessary

  • Prevents the unintended TypeError that occurs due to the extra len call.
  • Ensures proper error handling via the correct ValueError, which provides clarity on the mismatch between reward weights and reward functions.

Validation Steps

To ensure the fix resolves the issue, follow these steps:

  1. Setup: Use an args.reward_weights that is not None and ensure its length does not match the reward_funcs length.
  2. Expected Behavior Before Fix: A TypeError is raised at runtime.
  3. Expected Behavior After Fix: The code correctly raises a ValueError with the following message:
    ValueError: Number of reward weights (X) must match number of reward functions (Y)
    
    Where X is the number of weights and Y is the number of functions.

- Fixed a bug where an extra `len` call inside the error message caused a `TypeError` instead of the expected `ValueError`.
- Replaced `len(len(args.reward_weights))` with the correct `len(args.reward_weights)` to properly calculate the number of reward weights.
- Ensured that a `ValueError` is now raised with an accurate and clear message when the number of reward weights does not match the number of reward functions.

This fix prevents confusion during debugging and ensures proper error handling during validation.

Tested with cases where:
- `args.reward_weights` is None (default case).
- `args.reward_weights` has mismatched lengths with `reward_funcs`.
Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@qgallouedec qgallouedec changed the title Pull Request: Fix for Incorrect ValueError Handling in reward_weights in grpo_trainer.py 🪆 Fix for Incorrect ValueError Handling in reward_weights in grpo_trainer.py Feb 13, 2025
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec qgallouedec merged commit 8830786 into huggingface:main Feb 13, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants