Adding CUDA 12.8 migration #6980

jakirkham · 2025-01-30T10:49:45Z

In issue ( #6630 ) and ( #6720 ), we discussed and decided to move to the latest CUDA 12 (at the time 12.6). We decided to pin the minor version to avoid accidentally picking up incomplete updates of the CUDA Toolkit. Though in principle we were open to updating when a new version came out

Recently CUDA 12.8 was released and packaged on conda-forge ( conda-forge/cuda-feedstock#63 ). One of the new features is the addition of architectures sm_100, sm_101, and sm_120. For binaries to function correctly on these new architectures, they need to be rebuilt

Given this, think it would make sense to create a CUDA 12.8 migrator to roll out CUDA 12.8 to feedstocks so they can be rebuilt topologically

Should add would be ok dropping CUDA 12.6 at the same time

Would be interested in hearing others thoughts on this

cc @conda-forge/core

The text was updated successfully, but these errors were encountered:

hmaarrfk · 2025-01-30T11:14:16Z

Sounds good to me

h-vetinari · 2025-01-30T11:16:33Z

Thanks for the issue, and congrats on the quick rollout!

For binaries to function correctly on these new architectures, they need to be rebuilt

My understanding is the existing packages would still work fine on these new chips, obviously without yet making use of the sm_1**-specific instructions or features?

Given this, think it would make sense to create a CUDA 12.8 migrator to roll out CUDA 12.8 to feedstocks so they can be rebuilt topologically

Should add would be ok dropping CUDA 12.6 at the same time

We did 12.0 -> 12.6 without a migration, so unless there are (big) breaking changes, we could probably do the same for 12.6 -> 12.8? Not that I have anything against an explicit migration (I was in favour last time as well). Replacing 12.6 is fine as long as we don't lose backward compatibility, but the discussion in #6630 indicates that this won't be the case.

However, the "replacing" part could only come when we close the migration (I don't think it's an option to drop 12.6 when we open it), whereas in between, we'd be getting all of cuda_compiler_version in ("None", "11.8", "12.6", "12,8") for already-migrated feedstocks. In light of that we might want to go without a migration and/or drop 11.8 (which would have a bunch of benefits, see #6917 & #6967)

I'll note that I saw some incompatibilties when a feedstock ended up pulling in some 12.8 builds recently (logs):

 $PREFIX/targets/x86_64-linux/include/generated_cuda_meta.h:754:5: error: 'CUmemcpyAttributes' does not name a type; did you mean 'cudaMemcpyAttributes'?
    754 |     CUmemcpyAttributes *attrs;
        |     ^~~~~~~~~~~~~~~~~~
        |     cudaMemcpyAttributes

This could have simply been due to a toolchain mismatch (CUDA 12.6 compilers in build:, unpinned cuda-cupti pulling in 12.8 in host:), but I thought I'd mention it.

carterbox · 2025-01-30T19:42:08Z

I think a migrator is only necessary if you are writing a note to maintainers that they should only merge the migration once they have enabled the new CUDA archs and if the migrator tries to automatically update any skipping logic to unskip 12.8.

jakirkham · 2025-02-08T00:40:06Z

Thanks for all of the feedback so far! 🙏

Have taken an initial pass at this in PR: #7005

Commented on a few sections of note

Would really appreciate if all of you could take a look and share your suggestions 🙂

h-vetinari · 2025-02-08T02:07:10Z

My opinion is that we should either:

migrate 12.8, but drop 11.8 at the same time
not migrate 12.8 and just switch like we did for 12.6 (then we can still keep 11.8)

What I want to avoid is building for 3 CUDA versions + CPU (=4 times) by default, because it blows up the CI matrices for the entire duration of the 12.8 migration (until we drop 12.6 at the end). Given how long it took to wind down the long tail of the 12.0 migration, I don't want to have several months where CUDA-enabled feedstocks have 4x the number of baseline builds.

However, if we can agree to timebox the time until we close the migration (say, maximum one month; regardless how many feedstocks have migrated by then), then I'd be OK with #7005 as proposed. Even if we finish the migration before all feedstocks have caught up, would essentially be a softer version of what we did for 12.6 (i.e. option 2. above).

jakirkham · 2025-02-08T02:13:48Z

Will reiterate what I said in the OP

Should add would be ok dropping CUDA 12.6 at the same time

If that's what we want, let's figure out how to do that

h-vetinari · 2025-02-08T02:23:23Z

It's not possible to drop 12.6 when we start the migrator (then all feedstocks that the migration hasn't reached yet would not have any CUDA 12.x builds anymore upon rerendering).

If we want to keep 11.8, then the options are to do an immediate switch (like we did for 12.0 -> 12.6), or draw out this process slightly through the proposed migrator (at the cost of quadruple builds and thus for a limited amount of time) to give the most active feedstocks a chance at a more orderly update.

jakirkham · 2025-02-08T02:32:21Z

Am proposing the migrator would replace 12.6 with 12.8

h-vetinari · 2025-02-08T02:48:05Z

Am proposing the migrator would replace 12.6 with 12.8

Yes, but that replacement can only happen when we close the migrator; in the intervening time, we would have quadruple builds. Hence my point to limit the amount of time we allow the migration to run (since in any case it is already a gentler approach than the immediate switch, which was not terribly disruptive when we did that for 12.6).

Please, think through the mechanics of how this would actually play out. I'm not saying no to a migration, but if we do migrate (rather than just do the switch directly in the pinning), my condition is the timeboxing that I've laid out.

jakirkham · 2025-02-08T02:57:23Z

No I'm saying let's capture this behavior in the migrator itself

h-vetinari · 2025-02-08T03:03:45Z

No I'm saying let's capture this behavior in the migrator itself

That is not possible, to my understanding of how conda-build, smithy and our infrastructure works.

jakirkham · 2025-02-08T03:11:42Z

Would reframe that as it may not be currently possible

However based on your feedback it sounds desirable

So let's see if we can figure out a way to do it

This is likely not the last time we will want something like this

This was referenced Feb 8, 2025

Better checks for ffmpeg nvenc version conda-forge/ffmpeg-feedstock#288

Open

Add a migrator for CUDA 12.8 #7005

Draft

jakirkham self-assigned this Feb 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding CUDA 12.8 migration #6980

Adding CUDA 12.8 migration #6980

jakirkham commented Jan 30, 2025

hmaarrfk commented Jan 30, 2025

h-vetinari commented Jan 30, 2025

carterbox commented Jan 30, 2025

jakirkham commented Feb 8, 2025

h-vetinari commented Feb 8, 2025

jakirkham commented Feb 8, 2025

h-vetinari commented Feb 8, 2025 •

edited

Loading

jakirkham commented Feb 8, 2025

h-vetinari commented Feb 8, 2025

jakirkham commented Feb 8, 2025

h-vetinari commented Feb 8, 2025 •

edited

Loading

jakirkham commented Feb 8, 2025

Adding CUDA 12.8 migration #6980

Adding CUDA 12.8 migration #6980

Comments

jakirkham commented Jan 30, 2025

hmaarrfk commented Jan 30, 2025

h-vetinari commented Jan 30, 2025

carterbox commented Jan 30, 2025

jakirkham commented Feb 8, 2025

h-vetinari commented Feb 8, 2025

jakirkham commented Feb 8, 2025

h-vetinari commented Feb 8, 2025 • edited Loading

jakirkham commented Feb 8, 2025

h-vetinari commented Feb 8, 2025

jakirkham commented Feb 8, 2025

h-vetinari commented Feb 8, 2025 • edited Loading

jakirkham commented Feb 8, 2025

h-vetinari commented Feb 8, 2025 •

edited

Loading

h-vetinari commented Feb 8, 2025 •

edited

Loading