ReadMe

Custom Trainers

NeMo-Aligner uses custom trainers to coordinate all aspects of training. There are currently three custom trainers:

SupervisedTrainer: for SFT, SteerLM, and Reward modeling.
DPOTrainer: for DPO training.
CriticServerTrainer: trains the RL critic via PyTriton requests. It will also run the reward model depending on the configuration.
PPOTrainer: performs the RLHF PPO training, since PPO has components such as the Critic, this trainer will send inference and train requests via PyTriton to the CriticServerTrainer to train and run inference on the critic.
RSTrainer: performs the Rejection Sampling (RS) training. Since RS needs a reward model, this trainer will send inference requests via PyTriton to run inference on the reward model.

Configuration guide

See the example configurations in the conf folder for an explanation of different configurations we support. Note that all specified configurations in the .yaml file will overwrite the loaded model configuration from the pretrained checkpoint.

APIs

Our custom trainers will only call predefined APIs on the model passed in. These APIs are defined in alignable_interface.py.

Launching Scripts

Supervised Fine Tuning Training: train_gpt_sft.py with gpt_sft.yaml.
DPO Training: train_gpt_dpo.py with gpt_dpo.yaml.
Reward Model Training: train_reward_model.py with training_rm.yaml.
Reward Model Inference: serve_reward_model.py with inference_rm.yaml.
PPO Critic Server: serve_ppo_critic.py with gpt_ppo_critic.yaml.
PPO Actor Training: train_gpt_ppo_actor.py with gpt_ppo_actor.yaml.
Rejection Sampling Training: train_gpt_rs_actor.py with gpt_rs_actor.yaml.

To run a full RLHF PPO job, we need to start both the CriticServerTrainer and PPOTrainer.

RLHF Training architecture and details

Please see RLHFTraining.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ReadMe

Custom Trainers

Configuration guide

APIs

Launching Scripts

RLHF Training architecture and details

Files

README.md

Latest commit

History

README.md

File metadata and controls

ReadMe

Custom Trainers

Configuration guide

APIs

Launching Scripts

RLHF Training architecture and details