Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unified Checkpoint] Add unified checkpoint training args doc. #7756

Merged
merged 1 commit into from
Jan 2, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/trainer.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
trainer.md
# PaddleNLP Trainer API

PaddleNLP提供了Trainer训练API,针对训练过程的通用训练配置做了封装,比如:
Expand Down Expand Up @@ -661,6 +662,27 @@ Trainer 是一个简单,但功能完整的 Paddle训练和评估模块,并
The path to a folder with a valid checkpoint for your
model. (default: None)

--unified_checkpoint
是否统一混合并行训练的Checkpoint,(可选,默认为False)
Whether to unify hybrid parallel checkpoint. (default: False)

--unified_checkpoint_config
与Unified Checkpoint相关的一些优化配置项,以str形式传入配置。
支持如下选项:
skip_save_model_weight: 当master_weights存在时,跳过保存模型权重。
master_weight_compatible: 1. 仅当optimizer需要master_weights时,才进行加载;
2. 如果checkpoint中不存在master_weights,则将model weight作为master_weights进行加载。
async_save: 在保存Checkpoint至磁盘时做异步保存,不影响训练过程,提高训练效率。
enable_all_options: 上述参数全部开启。

Some additional config of Unified checkpoint, we provide some options to config.
Following config is support:
skip_save_model_weight, no need to save model weights when the master_weights exist.
master_weight_compatible, 1. if the master_weights exist, only load when needed.
2. if master_weights does not exist, convert model weights to master_weights when needed.
async_save, enable asynchronous saving checkpoints to disk.
enable_all_options, enable all unified checkpoint optimization configs.

--skip_memory_metrics
是否跳过内存profiler检测。(可选,默认为True,跳过)
Whether or not to skip adding of memory profiler reports
Expand Down