You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In torchtune, cant resume from checkpoint when using torchao:
File "/data/users/felipemello/torchtune/torchtune/training/checkpointing/_utils.py", line 249, in safe_torch_load
state_dict = torch.load(
^^^^^^^^^^^
File "/home/felipemello/.conda/envs/torchtune/lib/python3.11/site-packages/torch/serialization.py", line 1486, in load
raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
WeightsUnpickler error: Unsupported global: GLOBAL torchao.prototype.low_bit_optim.subclass_8bit.OptimState8bit was not an allowed global by default. Please use `torch.serialization.add_safe_globals([OptimState8bit])` or the `torch.serialization.safe_globals([OptimState8bit])` context manager to allowlist this global if you trust this class/function.
tune run full_finetune_single_device --config llama3_2/1B_full_single_device epochs=2 max_steps_per_epoch=20 optimizer=torchao.prototype.low_bit_optim.AdamW8bit
tune run full_finetune_single_device --config llama3_2/1B_full_single_device epochs=2 max_steps_per_epoch=20 optimizer=torchao.prototype.low_bit_optim.AdamW8bit resume_from_checkpoint=True checkpointer.checkpoint_files=["epoch_0/model-00001-of-00001.safetensors"]
The text was updated successfully, but these errors were encountered:
felipemello1
changed the title
Torchao opt resuming from ckpt requires weights_only=False?
Torchao opt resuming from ckpt requires weights_only=FalseMar 13, 2025
In torchtune, cant resume from checkpoint when using torchao:
to reproduce:
The text was updated successfully, but these errors were encountered: