You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One pain in training with deepspeed is that when resume from a checkpoint you have to use the same amount of gpus as the num of gpus that the checkpoint was trained on. otherwise, you will see the following error:
deepspeed.runtime.zero.utils.ZeRORuntimeException: The checkpoint being loaded used a DP world size of 32 but the current world size is 128. Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported.
Also when the model is well trained, and we want to paly with the inference, it would be a problem to load a deepspeed ckpt for inference as it requires the same num of GPUs as the training?
Description & Motivation
One pain in training with deepspeed is that when resume from a checkpoint you have to use the same amount of gpus as the num of gpus that the checkpoint was trained on. otherwise, you will see the following error:
see this issue. deepspeedai/DeepSpeed#3810
Also when the model is well trained, and we want to paly with the inference, it would be a problem to load a deepspeed ckpt for inference as it requires the same num of GPUs as the training?
But currently, Deepspeed proposes a universal checkpointing to convert the deepspeed ckpt to universal ckpt, which can be loaded in whatever many of gpus.
Please refer to the link
https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/universal_checkpointing#zero-stage-2-training
Would lightning integrate this feature?
Also a weird usage, is that any way to load only the checkpoint of the model while ignoring other state ckpt like optimizer?
Pitch
No response
Alternatives
No response
Additional context
No response
cc @Borda @awaelchli
The text was updated successfully, but these errors were encountered: