-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build failure: magma #220357
Comments
@ConnorBaker Is this related to the race condition error you saw before? |
@bcdarwin can you tell me more about what hardware you're using, and what your For reference, # ~/.config/nixpkgs/config.nix
{
allowUnfree = true;
cudaSupport = true;
cudaCapabilities = [ "8.6" ];
cudaForwardCompat = false;
} using a 4090 and an i9 13900k: $ nix run nixpkgs#nix-info -- -m
- system: `"x86_64-linux"`
- host os: `Linux 6.1.14-200.fc37.x86_64, Fedora Linux, 37 (Workstation Edition), nobuild`
- multi-user?: `no`
- sandbox: `no`
- version: `nix-env (Nix) 2.14.0pre20230208_ec78896`
- channels(connorbaker): `"nixpkgs"`
- nixpkgs: `/home/connorbaker/.nix-defexpr/channels/nixpkgs` EDIT: Would you also try building with #220366? I made some changes there that moved the CUDA runtime stub and NVCC into |
@samuela I don't think so -- closest I can remember to this was the linking error AMD HIP had with 2.7.x (which is why its stuck on magma 2.6.x), but that was a different inscrutable error from |
CPU is a Xeon Gold 5218 with Quadro RTX 8000 GPUs. I haven't set any configuration other than |
I don't know why this issue isn't affecting your builds but most likely the fix is to set |
Haven't tried #220366 yet either, sorry. |
Any idea when this failure started occurring? |
magma 2.6.2 I believe works but 2.7.1 fails. I can try to bisect at some point if needed. |
What channel were you building from by the way? I'm gonna try to find a way to match your setup but I need to know exactly which version of nixpkgs you're using to build. |
Okay, I was able to reproduce it. It happened while a Nixpkgs-review for that magma PR I linked earlier. I suspect it has something to do with either my disabling |
Something for me to look at later tonight; if it is the case that the increased number of cuda capabilities being targeted is causing the binary to bloat, check that we're using Since PyTorch also ships binaries targeting a bunch of different CUDA capabilities, their configs are a goldmine of flags we might need to look into. EDIT: Haven't built with it yet, but it seems like it should fix it. Apache MXNet had the same issue: apache/mxnet#19123. EDIT2: Reminder to self: if that is the fix, add
|
I know it's still a draft @bcdarwin but can you try building again with #220402? I think it should fix your issue. Unrelated, but if you're building anything from source and you want faster builds I highly recommend specifying the single compute capability you need to build for (like I do here #220357 (comment)) because it results in massively faster builds. HEAD right now builds for 14 different capabilities and that PR gets it down to 8, but it's much faster to build for just one. |
|
…tbins cudaPackages: fix #220357; use -Xfatbin=-compress-all; prune default cudaCapabilities
Steps To Reproduce
Steps to reproduce the behavior:
Build log
From the full log:
Notify maintainers
@tbenst
Also @ConnorBaker @samuela may be interested.
Metadata
The text was updated successfully, but these errors were encountered: