forked from modelscope/ms-swift
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix grpo vllm lora (modelscope#3134)
- Loading branch information
1 parent
1ccbea8
commit 3a41cca
Showing
17 changed files
with
136 additions
and
163 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# pip install math_verify # reward function | ||
# pip install "trl>=0.15" | ||
# GPU memory: 2 * 80GiB | ||
|
||
MASTER_PORT=29501 \ | ||
CUDA_VISIBLE_DEVICES=0,1 \ | ||
swift rlhf \ | ||
--rlhf_type grpo \ | ||
--model Qwen/Qwen2.5-7B \ | ||
--reward_funcs accuracy format \ | ||
--train_type lora \ | ||
--use_vllm true \ | ||
--vllm_device auto \ | ||
--vllm_gpu_memory_utilization 0.7 \ | ||
--vllm_max_model_len 8192 \ | ||
--lora_rank 8 \ | ||
--lora_alpha 32 \ | ||
--target_modules all-linear \ | ||
--torch_dtype bfloat16 \ | ||
--dataset 'AI-MO/NuminaMath-TIR#1000' \ | ||
--max_completion_length 1024 \ | ||
--num_train_epochs 1 \ | ||
--per_device_train_batch_size 16 \ | ||
--per_device_eval_batch_size 16 \ | ||
--learning_rate 1e-5 \ | ||
--gradient_accumulation_steps 1 \ | ||
--eval_steps 100 \ | ||
--save_steps 100 \ | ||
--save_total_limit 2 \ | ||
--logging_steps 5 \ | ||
--max_length 2048 \ | ||
--output_dir output \ | ||
--warmup_ratio 0.05 \ | ||
--dataloader_num_workers 4 \ | ||
--dataset_num_proc 4 \ | ||
--num_generations 16 \ | ||
--temperature 0.9 \ | ||
--deepspeed zero2 \ | ||
--system 'examples/train/grpo/prompt.txt' \ | ||
--log_completions true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.