Skip to content

Commit

Permalink
[Auto Parallel] Support semi-auto trainer and fit Llama2 training (#7885
Browse files Browse the repository at this point in the history
)

* support semi-auto trainer and fit Llama2 training

* support shard_dataloader in dynamic semi-auto

* rewrite traning loop

* refactor traning loop

* refine args of auto trainer

* broadcast loss

* add auto ci cases
  • Loading branch information
haohongxiang authored Jan 31, 2024
1 parent 44bfeb0 commit 3a704ea
Show file tree
Hide file tree
Showing 12 changed files with 1,139 additions and 369 deletions.
2 changes: 1 addition & 1 deletion llm/llama/auto_parallel/run_auto.sh
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,6 @@ python -u -m paddle.distributed.launch \
--do_eval \
--device "gpu" \
--data_impl "mmap" \
--parallel_mode "auto"
--enable_auto_parallel 1

# --resume_from_checkpoint "output/llama_auto_serial/checkpoint-2" \
2 changes: 1 addition & 1 deletion llm/llama/auto_parallel/run_auto_sp.sh
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ python -u -m paddle.distributed.launch \
--do_eval \
--device "gpu" \
--data_impl "mmap" \
--parallel_mode "auto" \
--enable_auto_parallel 1 \
--sequence_parallel true \

# --resume_from_checkpoint "output/llama_auto_serial/checkpoint-2" \
Loading

0 comments on commit 3a704ea

Please sign in to comment.