-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure when running on cluster #1509
Comments
Also running into something similar -- will document it in another issue. |
Were you able to fix this issue? Still having a similar problem linked above. |
You can set |
You can try clearing the cache by deleting the |
I came into a similar issue, too. I fixed the issue by a symlink of |
In the long term, we plan to remove all usages of GenISA intrinsics. Signed-off-by: Whitney Tsang <[email protected]>
When running on a cluster with 2 nodes with 4 gpus each i randomly run into this bug:
The bug occurs kinda randomly. With the exact same configuration, it appears sometimes or not.
I'm using python 3.9, pytorch 2.0, Xformers 0.16
The text was updated successfully, but these errors were encountered: