stack-llama #273

edbeeching · 2023-04-04T11:13:51Z

adds the stack-llama example

HuggingFaceDocBuilderDev · 2023-04-04T11:17:38Z

The documentation is not available anymore as the PR was closed or merged.

review-notebook-app · 2023-04-04T11:19:20Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…points

lvwerra

Looks in really good! Left a few comments.

What do you think about simplifying the naming of the scripts a bit:

reward_modeling.py
rl_training.py
supervised_finetuning.py

Even if we used DeepSpeed, since with PEFT it should work as well without I think we can omit it here for simplification. What do you think?

lvwerra · 2023-04-05T12:33:39Z

examples/stack_llama/scripts/README.md

+1. Supervised fine-tuning of the base llama-7b model to create llama-7b-se:
+    - `python  examples/stack_llama/scripts/sft_stack_exchange_peft.py --model_path=<LLAMA_MODEL_PATH> --streaming --no_gradient_checkpointing --learning_rate 1e-5 --max_steps 5000 --output_dir ./llama-se`
+2. Reward modeling using dialog pairs from the SE dataset using the llama-7b-se to create llama-7b-se-rm:
+    - `deepspeed --num_gpus=8 examples/stack_llama/scripts/reward_modeling_peft.py --model_name=<LLAMA_SE_MODEL> --deepspeed="/fsx/kashif/llama-SE/ds_config.json"`


We don't really need DS for this right if we use PEFT? can we just use accelerate here? I think this would make things much simpler.

Yes I agree, just matching what was used when @kashif trained

lvwerra · 2023-04-05T12:37:24Z

examples/stack_llama/scripts/README.md

+    - `deepspeed --num_gpus=8 examples/stack_llama/scripts/reward_modeling_peft.py --model_name=<LLAMA_SE_MODEL> --deepspeed="/fsx/kashif/llama-SE/ds_config.json"`
+3. RL fine-tuning of llama-7b-se with the llama-7b-se-rm reward model:
+    - `acclerate launch examples/stack_llama/scripts/rl_finetuning_peft.py --log_with=wandb --model_name=<LLAMA_SE_MODEL> --reward_model_name=<LLAMA_SE_RM_MODEL> --adafactor=False --tokenizer_name=<LLAMA_TOKENIZER> --save_freq=100 --output_max_length=128 --batch_size=8 --gradient_accumulation_steps=8 --batched_gen=True --ppo_epochs=4 --seed=0 --learning_rate=1.4e-5 --early_stopping=True --output_dir=llama-se-rl-finetune-128-8-8-1.4e-5_adam`
+


I think we should have a note that if you want to use multiple GPUs you should use torchrun plust the appropriate args (same with accelerate).

actually we could also add the mutli-gpu commands as default since they also work for 1 GPU.

examples/stack_llama/scripts/rl_finetuning_peft.py

Co-authored-by: Leandro von Werra <[email protected]>

edbeeching added 3 commits April 4, 2023 11:49

adds the main scripts

760ce6d

adds non-score reward clamping

2387ae8

Adds adapter merge script.

bad3a12

edbeeching requested review from lewtun and lvwerra April 4, 2023 11:13

edbeeching marked this pull request as draft April 4, 2023 11:15

style

e2580c1

edbeeching added 7 commits April 4, 2023 14:25

adds non_reward clamp option to config

d1a6b77

reverts kl clamping

3ba2d0c

style

88a4de0

makes model name required for adapter merge

f9fc076

updates merge adapter so it does not refer to HF internal llama check…

230d7ac

…points

renames to stack_llama, adds clearer instructions

52f9d6d

updates readme, adds ds config

8c84875

lvwerra reviewed Apr 5, 2023

View reviewed changes

edbeeching and others added 2 commits April 5, 2023 14:55

Update examples/stack_llama/scripts/rl_finetuning_peft.py

a766a83

Co-authored-by: Leandro von Werra <[email protected]>

Update examples/stack_llama/scripts/rl_finetuning_peft.py

d92f60d

Co-authored-by: Leandro von Werra <[email protected]>

edbeeching marked this pull request as ready for review April 5, 2023 12:56

edbeeching added 3 commits April 5, 2023 15:00

removes ds config, renamed scripts

a51274f

style

1206c5f

updates launch commands

6c16ed3

lvwerra approved these changes Apr 5, 2023

View reviewed changes

edbeeching merged commit d8ae4d0 into main Apr 5, 2023

edbeeching deleted the stack-llama branch April 5, 2023 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stack-llama #273

stack-llama #273

edbeeching commented Apr 4, 2023

HuggingFaceDocBuilderDev commented Apr 4, 2023 •

edited

Loading

review-notebook-app bot commented Apr 4, 2023

lvwerra left a comment

lvwerra Apr 5, 2023

lvwerra Apr 5, 2023

edbeeching Apr 5, 2023

lvwerra Apr 5, 2023

stack-llama #273

stack-llama #273

Conversation

edbeeching commented Apr 4, 2023

HuggingFaceDocBuilderDev commented Apr 4, 2023 • edited Loading

review-notebook-app bot commented Apr 4, 2023

lvwerra left a comment

Choose a reason for hiding this comment

lvwerra Apr 5, 2023

Choose a reason for hiding this comment

lvwerra Apr 5, 2023

Choose a reason for hiding this comment

edbeeching Apr 5, 2023

Choose a reason for hiding this comment

lvwerra Apr 5, 2023

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 4, 2023 •

edited

Loading