-
Notifications
You must be signed in to change notification settings - Fork 924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"cat_cuda" not implemented for 'Float8_e4m3fn' #1711
Comments
Could you please share the part further up in the log so we can pinpoint exactly where the problem is? |
same problem |
I have the same problem as well but only when trying to use multi-gpu. Training on a single GPU works fine. This only crops up when I reconfigure accelerate from 1 machine/1 gpu to 1 machine/3 gpus. (I have 4 installed, purposely only trying to use 3 for training.) My rig: Training Settings: Log Output Returned:
|
It seems that fp8 support is achieved by Accelerate using the transformer engine internally, but multi-GPU training may not be supported. Further investigation is needed. |
same error trying to run on two gpus.
|
torch 2.4.0
flux1-dev-fp8-e4m3fn.safetensors
t5xxl_fp8_e4m3fn.safetensors
Settings:
--fp8_base
--split_mode
Error:
[rank0]: RuntimeError: "cat_cuda" not implemented for 'Float8_e4m3fn'
PyTorch doesn't yet support Float8_e4m3fn for torch.cat (probably), but "fp8_base" should be able to handle float8_e4m3fn models.
The text was updated successfully, but these errors were encountered: