Reducing the size of the cuda_compiler zip #6967

h-vetinari · 2025-01-26T23:11:49Z

The size of the zip involving the cuda compilers is pretty unwieldy:

conda-forge-pinning-feedstock/recipe/conda_build_config.yaml

Lines 181 to 189 in ff043bb

    
           zip_keys: 
        
             -                             # [unix] 
        
               - c_compiler_version        # [unix] 
        
               - cxx_compiler_version      # [unix] 
        
               - fortran_compiler_version  # [unix] 
        
               - cuda_compiler             # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"] 
        
               - cuda_compiler_version     # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"] 
        
               - github_actions_labels     # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"] 
        
               - docker_image              # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True" and os.environ.get("BUILD_PLATFORM", "").startswith("linux-")]

which makes it hard to override locally. There's a general desire to keep the zip complexity down and remove entries here, rather than adding them.

And indeed we recently removed cdt_name (48621d1) as well as c_stdlib_version (c7f5973), and we made it easier to override the image version without having to locally override the whole zip (#6626).

#6910 now added another component (github_action_labels) to the zip, due to the fact that there exist feedstocks that require GPU agents (a very rare resource), and that without linking this with the zip that determines non-CUDA vs. CUDA, we're stuck with only bad options:

waste resources by using GPU agents even for the non-CUDA builds
local workarounds like h-vetinari/pytorch-cpu-feedstock@6c232ae, which cause several issues:
- duplicated builds in CI matrix
- very long variant config names, which causes failures on windows, see Test: CPU vs GPU workaround pytorch-cpu-feedstock#332 & lower limit at which we start shortening filenames for variant configs conda-smithy#2233

The desire for simplity in the zip is well-founded, but it doesn't trump reality. Keeping these data apart when they are closely connected causes far more problems than a single extra zip entry (aside from the fact that only 5 feedstocks across all of conda-forge are affected and the required changes are trivial).

The CUDA zip overall is IMO a reflection of irreducible complexity that exists in our packaging constraints. However, there are some nice opportunities to further simplify this zip:

docker_image can be removed as soon as we drop CUDA 11.8 (because then everything can just run on the alma9 images)
{c,cxx,fortran}_compiler_version could be dropped if we accept being limited by the maximum GCC version supported by our lowest-supported CUDA
- this can obviously be accelerated if we keep only building for the newest CUDA 12.x, as we recently started doing (and assuming that newer nvcc keeps updating their GCC compatibility).
- the problem case there is CUDA on PPC, which was dropped in CUDA 12.5; CUDA 12.4 is limited to gcc 12
we might even consider turning cuda_compiler into a constant (once we drop 11.8)

So in most realistic most scenario (after we drop 11.8 but assuming we keep CUDA on PPC for the foreseeable future), things would look like

zip_keys:
  -                             # [unix]
    - c_compiler_version        # [unix]
    - cxx_compiler_version      # [unix]
    - fortran_compiler_version  # [unix]
    - cuda_compiler             # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
    - cuda_compiler_version     # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
    - github_actions_labels     # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]

which is still 2 entries shorter than it was until last November.

If we did drop CUDA support on PPC and relied only on the latest CUDA (as well as recipes guarding their {{ compiler("cuda") }} correctly with cuda_compiler_version != "None" selectors), the zip could even be reduced to

zip_keys:
  -                             # [unix]
    - c_compiler_version        # [unix]
    - cxx_compiler_version      # [unix]
    - fortran_compiler_version  # [unix]
  # uncoupled!
  -                             # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
    - cuda_compiler_version     # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]
    - github_actions_labels     # [linux and os.environ.get("CF_CUDA_ENABLED", "False") == "True"]

but not sure whether we have the appetite for that. IMO the longer zip is less of an issue than dropping support for a platform.

The text was updated successfully, but these errors were encountered:

isuruf · 2025-01-27T04:53:38Z

duplicated builds in CI matrix

This IMO is a bug.

very long variant config names, which causes failures on windows, see
conda-forge/pytorch-cpu-feedstock#332 & conda-forge/conda-smithy#2233

This is also a bug in conda-smithy. We have

C:\Users\runnerx\_work\pytorch-cpu-feedstock\pytorch-cpu-feedstock\.ci_support\linux_64_blas_implgenericc_compiler_version13channel_targetsconda-forge_maincuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13github_actions_labelscirun-openstack-cpu-2xlargeis_h2ab6b548.yaml

which has 284 characters and you can have only 260 on windows.

isuruf · 2025-01-27T04:57:33Z

duplicated builds in CI matrix

I can help you with debugging this if you have a commit I can have a look at.

h-vetinari · 2025-01-27T05:16:24Z

I can help you with debugging this if you have a commit I can have a look at.

It's just regular smithy behaviour. It could probably be avoided by adding

  skip: true  # [cuda_compiler_version == "None" and github_actions_labels == "cirun-openstack-gpu-2xlarge"]
  skip: true  # [cuda_compiler_version != "None" and github_actions_labels == "cirun-openstack-cpu-2xlarge"]

to every single output, but that becomes another pointless thing to keep in sync.

This is also a bug in conda-smithy. We have

C:\Users\runnerx\_work\pytorch-cpu-feedstock\pytorch-cpu-feedstock\.ci_support\linux_64_blas_implgenericc_compiler_version13channel_targetsconda-forge_maincuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13github_actions_labelscirun-openstack-cpu-2xlargeis_h2ab6b548.yaml

which has 284 characters and you can have only 260 on windows.

ah, I was wondering how we got beyond 260 (conda-forge/conda-smithy#2233 uses 200), as that was not at all obvious from the error message. However, if we subtract another 24 characters (to get that path <=260), we'll lose visibility of the GHA label in the path.

To be clear, I'm sure there's some halfway workable alternative to putting github_action_labels in the zip - but all those hacks are IMO worse than just dealing with the fact that we should be able to determine CPU vs. GPU agents together with the quantity (cuda_compiler_version) that determines whether or not we have a CUDA build.

All in all, the zip got shorter since last November, and it'll get shorter still once we drop CUDA 11.8. I don't see how a cosmetic concern outweighs all the workarounds that become necessary otherwise.

isuruf · 2025-01-27T06:27:03Z

It's just regular smithy behaviour. It could probably be avoided by adding

Here's the recipe I tried and it skipped windows just like I intended

package:
  name: foo-split
  version: 1.0.0

build:
  number: 1
  skip: true   # [win]

outputs:
  - name: foo

  - name: bar
    build:
      skip: true  # [linux]

about:
  home: asd

h-vetinari · 2025-01-27T06:45:28Z

See conda/conda-build#4206 / conda-forge/conda-smithy#1460

h-vetinari · 2025-01-27T08:07:04Z

Here's the recipe I tried and it skipped windows just like I intended

You could also make a fork of conda-forge/tensorflow-feedstock#414, and figure what's necessary to get smithy to only produce one set of jobs.

h-vetinari · 2025-02-08T06:06:32Z

@isuruf, did you have some time to look at how to achieve this (e.g. on the tensorflow feedstock)?

h-vetinari mentioned this issue Jan 26, 2025

distinguish github_actions_labels for CPU vs. CUDA builds #6910

Merged

isuruf mentioned this issue Jan 27, 2025

Revert "distinguish github_actions_labels for CPU vs. CUDA builds" #6968

Merged

h-vetinari mentioned this issue Jan 27, 2025

lower limit at which we start shortening filenames for variant configs conda-forge/conda-smithy#2233

Draft

h-vetinari mentioned this issue Jan 30, 2025

Adding CUDA 12.8 migration #6980

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing the size of the cuda_compiler zip #6967

Reducing the size of the cuda_compiler zip #6967

h-vetinari commented Jan 26, 2025 •

edited

Loading

isuruf commented Jan 27, 2025

isuruf commented Jan 27, 2025

h-vetinari commented Jan 27, 2025

isuruf commented Jan 27, 2025

h-vetinari commented Jan 27, 2025 •

edited

Loading

h-vetinari commented Jan 27, 2025

h-vetinari commented Feb 8, 2025

Reducing the size of the cuda_compiler zip #6967

Reducing the size of the cuda_compiler zip #6967

Comments

h-vetinari commented Jan 26, 2025 • edited Loading

isuruf commented Jan 27, 2025

isuruf commented Jan 27, 2025

h-vetinari commented Jan 27, 2025

isuruf commented Jan 27, 2025

h-vetinari commented Jan 27, 2025 • edited Loading

h-vetinari commented Jan 27, 2025

h-vetinari commented Feb 8, 2025

h-vetinari commented Jan 26, 2025 •

edited

Loading

h-vetinari commented Jan 27, 2025 •

edited

Loading