-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Overflow when unpacking long #364
Comments
Not sure why it's wrapped inside a Tensor in the first place, @muellerzr ? |
Hi, I just tried to change:
and the error seem doesn't happen anymore. I save checkpoints successfully ten times in a row. But I'm not sure that's the proper way to fix it. |
@sgugger you're right, it shouldn't be. Not sure where I saw that happening when I was looking at it, but will put in a fix today |
The seed is an int, not a float @nguyenhuuthuat09, you won't be able to reload that RNG state if you save it as float. The proper fix is to jsut remove |
Great! Thank you so much!!! |
Another thing that could cause this is if you accidentally added a seed that was too long, for example, pasted it twice. |
Environment info
Machine
: Google Cloud TPU VM versionv2-alpha
transformers
: 4.18.0accelerate
:0.9.0.dev0
(same error happen with0.8.0.dev0
)Script
I am training a
GPT2
model using Pytorch run_clm_no_trainer.py.Error
Below error happen when model is saving checkpoints. But seem that it only occurs at third or second checkpoint.
Enviroment variable
export XRT_TPU_CONFIG="localservice;0;localhost:51011"
accelerate config
and useaccelerate launch
to run the code.export XLA_USE_BF16=1
export XLA_TENSOR_ALLOCATOR_MAXSIZE=100000000
Releated issue
states["xm_seed"] = torch.tensor(xm.get_rng_state())
->states["xm_seed"] = torch.tensor(xm.get_rng_state(), dtype=torch.float32)
may help?Thank you for great library!
The text was updated successfully, but these errors were encountered: