Refactor `ScriptArguments` #2145

qgallouedec · 2024-09-30T16:58:11Z

What does this PR do?

Script arguments class is defined

Directly in scripts, like

Lines 64 to 69 in 70036bf

    
           @dataclass 
        
           class ScriptArguments: 
        
               dataset_name: str = field( 
        
                   default="trl-internal-testing/hh-rlhf-helpful-base-trl-style", 
        
                   metadata={"help": "The name of the dataset to use."}, 
        
               )

with SFTScriptArguments:

trl/trl/commands/cli_utils.py

Lines 82 to 94 in 70036bf

    
           @dataclass 
        
           class SFTScriptArguments: 
        
               dataset_name: str = field( 
        
                   default="timdettmers/openassistant-guanaco", 
        
                   metadata={"help": "the dataset name"}, 
        
               ) 
        
               dataset_train_split: str = field(default="train", metadata={"help": "The dataset split to train on"}) 
        
               dataset_test_split: str = field(default="test", metadata={"help": "The dataset split to evaluate on"}) 
        
               config: str = field(default=None, metadata={"help": "Path to the optional config file"}) 
        
               gradient_checkpointing_use_reentrant: bool = field( 
        
                   default=False, 
        
                   metadata={"help": "Whether to apply `use_reentrant` for gradient_checkpointing"}, 
        
               )

with DPOScriptArguments:

trl/trl/commands/cli_utils.py

Lines 112 to 129 in 70036bf

    
           @dataclass 
        
           class DPOScriptArguments: 
        
               dataset_name: str = field(default=None, metadata={"help": "the dataset name"}) 
        
               dataset_train_split: str = field(default="train", metadata={"help": "The dataset split to use for training"}) 
        
               dataset_test_split: str = field(default="test", metadata={"help": "The dataset split to use for evaluation"}) 
        
               ignore_bias_buffers: bool = field( 
        
                   default=False, 
        
                   metadata={ 
        
                       "help": "debug argument for distributed training;" 
        
                       "fix for DDP issues with LM bias/mask buffers - invalid scalar type,`inplace operation. See" 
        
                       "https://github.com/huggingface/transformers/issues/22482#issuecomment-1595790992" 
        
                   }, 
        
               ) 
        
               config: str = field(default=None, metadata={"help": "Path to the optional config file"}) 
        
               gradient_checkpointing_use_reentrant: bool = field( 
        
                   default=False, 
        
                   metadata={"help": "Whether to apply `use_reentrant` for gradient_checkpointing"}, 
        
               )

with RewardScriptArguments:

trl/trl/commands/cli_utils.py

Lines 97 to 109 in 70036bf

    
           @dataclass 
        
           class RewardScriptArguments: 
        
               dataset_name: str = field( 
        
                   default="trl-lib/ultrafeedback_binarized", 
        
                   metadata={"help": "the dataset name"}, 
        
               ) 
        
               dataset_train_split: str = field(default="train", metadata={"help": "The dataset split to train on"}) 
        
               dataset_test_split: str = field(default="test", metadata={"help": "The dataset split to evaluate on"}) 
        
               config: str = field(default=None, metadata={"help": "Path to the optional config file"}) 
        
               gradient_checkpointing_use_reentrant: bool = field( 
        
                   default=False, 
        
                   metadata={"help": "Whether to apply `use_reentrant` for gradient_checkpointing"}, 
        
               )

They are pretty much the same. I suggest:

Merging everything into one ScriptArguments
Deprecate SFTScriptArguments, DPOScriptArguments and RewardScriptArguments in favor of ScriptArguments.
Standardize the use of ScriptArguments to all examples scripts.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2024-09-30T17:06:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2024-10-11T16:49:07Z

trl/utils.py

+    gradient_checkpointing_use_reentrant (`bool`, *optional*, defaults to `False`):
+        Whether to apply `use_reentrant` for gradient_checkpointing.
+    ignore_bias_buffers (`bool`, *optional*, defaults to `False`):
+        Debug argument for distributed training. Fix for DDP issues with LM bias/mask buffers - invalid scalar type,


It seems to be a debug argument. Should we remove it like in #2055?

trl/utils.py

docs/source/rloo_trainer.md

tests/test_rloo_trainer.py

examples/scripts/rloo/rloo.py

examples/scripts/ppo/ppo.py

edbeeching

Thanks for harmonizing all this, LGTM apart from 1 change about the split for some of the RL examples, if you change it be sure to update the docs etc.

… script_args

qgallouedec added 3 commits September 30, 2024 16:53

DPOScriptArguments to ScriptArguments

4762a09

use dataset_train_split

5d0f36e

Use scriptarguments

9392c30

qgallouedec and others added 6 commits October 11, 2024 18:14

Merge branch 'main' into script_args

d84b9ce

dataset names in command lines

9130312

use ScriptArguments everywhere

49bd618

ignore biais buffer to end

3965682

remove in v0.13

adcbbc0

rm comment

25cfec6

qgallouedec commented Oct 11, 2024

View reviewed changes

trl/utils.py Show resolved Hide resolved

qgallouedec marked this pull request as ready for review October 11, 2024 16:50

qgallouedec requested review from kashif, edbeeching and lewtun October 11, 2024 16:50

qgallouedec and others added 2 commits October 11, 2024 17:18

update test commands

7160f9f

Merge branch 'main' into script_args

2f07a47

qgallouedec commented Oct 11, 2024

View reviewed changes

docs/source/rloo_trainer.md Outdated Show resolved Hide resolved

Update docs/source/rloo_trainer.md

f34a78f

qgallouedec commented Oct 11, 2024

View reviewed changes

tests/test_rloo_trainer.py Outdated Show resolved Hide resolved

Update tests/test_rloo_trainer.py

8d0dcfc

edbeeching reviewed Oct 14, 2024

View reviewed changes

examples/scripts/rloo/rloo.py Outdated Show resolved Hide resolved

edbeeching reviewed Oct 14, 2024

View reviewed changes

examples/scripts/ppo/ppo.py Outdated Show resolved Hide resolved

edbeeching approved these changes Oct 14, 2024

View reviewed changes

qgallouedec and others added 4 commits October 14, 2024 08:45

Added dataset_train_split argument to ppo.py and rloo.py

747fb7c

Merge branch 'main' into script_args

41c9896

update scripts with dataset_train_split

f946ab7

Merge branch 'script_args' of https://github.com/huggingface/trl into…

bf9ecc6

… script_args

qgallouedec merged commit 7e394b0 into main Oct 14, 2024
10 checks passed

qgallouedec deleted the script_args branch October 14, 2024 09:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `ScriptArguments` #2145

Refactor `ScriptArguments` #2145

qgallouedec commented Sep 30, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 30, 2024

qgallouedec Oct 11, 2024

edbeeching left a comment

	@dataclass
	class ScriptArguments:
	dataset_name: str = field(
	default="trl-internal-testing/hh-rlhf-helpful-base-trl-style",
	metadata={"help": "The name of the dataset to use."},
	)

	@dataclass
	class SFTScriptArguments:
	dataset_name: str = field(
	default="timdettmers/openassistant-guanaco",
	metadata={"help": "the dataset name"},
	)
	dataset_train_split: str = field(default="train", metadata={"help": "The dataset split to train on"})
	dataset_test_split: str = field(default="test", metadata={"help": "The dataset split to evaluate on"})
	config: str = field(default=None, metadata={"help": "Path to the optional config file"})
	gradient_checkpointing_use_reentrant: bool = field(
	default=False,
	metadata={"help": "Whether to apply `use_reentrant` for gradient_checkpointing"},
	)

	@dataclass
	class DPOScriptArguments:
	dataset_name: str = field(default=None, metadata={"help": "the dataset name"})
	dataset_train_split: str = field(default="train", metadata={"help": "The dataset split to use for training"})
	dataset_test_split: str = field(default="test", metadata={"help": "The dataset split to use for evaluation"})
	ignore_bias_buffers: bool = field(
	default=False,
	metadata={
	"help": "debug argument for distributed training;"
	"fix for DDP issues with LM bias/mask buffers - invalid scalar type,`inplace operation. See"
	"https://github.com/huggingface/transformers/issues/22482#issuecomment-1595790992"
	},
	)
	config: str = field(default=None, metadata={"help": "Path to the optional config file"})
	gradient_checkpointing_use_reentrant: bool = field(
	default=False,
	metadata={"help": "Whether to apply `use_reentrant` for gradient_checkpointing"},
	)

	@dataclass
	class RewardScriptArguments:
	dataset_name: str = field(
	default="trl-lib/ultrafeedback_binarized",
	metadata={"help": "the dataset name"},
	)
	dataset_train_split: str = field(default="train", metadata={"help": "The dataset split to train on"})
	dataset_test_split: str = field(default="test", metadata={"help": "The dataset split to evaluate on"})
	config: str = field(default=None, metadata={"help": "Path to the optional config file"})
	gradient_checkpointing_use_reentrant: bool = field(
	default=False,
	metadata={"help": "Whether to apply `use_reentrant` for gradient_checkpointing"},
	)

Refactor ScriptArguments #2145

Refactor ScriptArguments #2145

Conversation

qgallouedec commented Sep 30, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Sep 30, 2024

qgallouedec Oct 11, 2024

Choose a reason for hiding this comment

edbeeching left a comment

Choose a reason for hiding this comment

Refactor `ScriptArguments` #2145

Refactor `ScriptArguments` #2145

qgallouedec commented Sep 30, 2024 •

edited

Loading