You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using --fp8_base --save_state with both train_network scripts (SD and SDXL), saving state crashes. The log shows safetensors is crashing due to a failed dict lookup for the dtype size. Updating safetensors to the latest version fixed the issue. 0.4.2 as of writing
saving last state.
Traceback (most recent call last):
File "/home/hope/src/sd/sd-scripts/train_network.py", line 1062, in <module>
trainer.train(args)
File "/home/hope/src/sd/sd-scripts/train_network.py", line 964, in train
train_util.save_state_on_train_end(args, accelerator)
File "/home/hope/src/sd/sd-scripts/library/train_util.py", line 4573, in save_state_on_train_end
accelerator.save_state(state_dir)
File "/home/hope/src/sd/sd-scripts/venv/lib/python3.11/site-packages/accelerate/accelerator.py", line 2708, in save_state
save_location = save_accelerator_state(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hope/src/sd/sd-scripts/venv/lib/python3.11/site-packages/accelerate/checkpointing.py", line 99, in save_accelerator_state
save(state, output_model_file, save_on_each_node=save_on_each_node, safe_serialization=safe_serialization)
File "/home/hope/src/sd/sd-scripts/venv/lib/python3.11/site-packages/accelerate/utils/other.py", line 181, in save
save_func(obj, f)
File "/home/hope/src/sd/sd-scripts/venv/lib/python3.11/site-packages/safetensors/torch.py", line 232, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)
^^^^^^^^^^^^^^^^^
File "/home/hope/src/sd/sd-scripts/venv/lib/python3.11/site-packages/safetensors/torch.py", line 402, in _flatten
return {
^
File "/home/hope/src/sd/sd-scripts/venv/lib/python3.11/site-packages/safetensors/torch.py", line 406, in <dictcomp>
"data": _tobytes(v, k),
^^^^^^^^^^^^^^
File "/home/hope/src/sd/sd-scripts/venv/lib/python3.11/site-packages/safetensors/torch.py", line 362, in _tobytes
bytes_per_item = _SIZE[tensor.dtype]
~~~~~^^^^^^^^^^^^^^
KeyError: torch.float8_e4m3fn
The text was updated successfully, but these errors were encountered:
When using
--fp8_base --save_state
with both train_network scripts (SD and SDXL), saving state crashes. The log shows safetensors is crashing due to a failed dict lookup for the dtype size.Updating safetensors to the latest version fixed the issue. 0.4.2 as of writing
The text was updated successfully, but these errors were encountered: