-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't deepcopy an xformer model with triton 2 update #290
Comments
Oh jeez.. OK, you can remove triton from your env and this should unlock short term, I'll have a look in the meantime :) |
For ref even the SWA impl in vanilla pytorch relies on deepcopy: https://github.com/pytorch/pytorch/blob/master/torch/optim/swa_utils.py#L100 -- not being able to EMA is a ~1-2% loss in overall perf 😬 |
it should be fine with the attached PR @jramapuram, if you see other issues it's easy to fix and augment the unit test, basically in that case lazy initializing the triton parts fixes that |
sorry for the delay, I should have seen that before |
Awesome @blefaudeux ! Testing now, thanks 🙏 |
let me know if another part fails this, I should be able to fix in a similar fashion. And I'm still on #219, trying to come up with a repro less expensive than full blown IN. My current lead hypothesis is to provide means to handle various inits out of the box, right now deepnorm sets the distribution to a scaled uniform init, and it's probably not the best for all problems. It can always be done from the outside, but it kind of negates the benefit of having deepnorm out of the box |
🐛 Bug
Given the recent triton2 update an xformer model cannot be deep-copied. This is an important requirement for numerous tasks including EMA (without knowledge of the generating class / hyper-params).
xformers ViT-B Config
To reproduce
Error is:
TypeError: cannot pickle 'PyCapsule' object
The text was updated successfully, but these errors were encountered: