Scalar indexing with CUDA 5.6.0 #4036

simone-silvestri · 2025-01-10T08:37:07Z

Reductions on Oceananigans seem to not work on GPUs using CUDA 5.6.0 which switched to GPUArrays 11.2.0.
It looks like scalar indexing errors are now popping up. Seems to be connected with the fact that GPUarrays switched to a KernelAbstraction backend. I ll investigate a bit.

MWE:

using Oceananigans
grid = RectilinearGrid(GPU(), size=(1, 1, 1), extent=(1, 1, 1))
model = NonhydrostaticModel(; grid)
maximum(abs, model.velocities.u)

simone-silvestri · 2025-01-10T08:43:06Z

the issue seems to be

 [16] mapfirst!(f::typeof(identity), R::SubArray{…}, A::Oceananigans.AbstractOperations.ConditionalOperation{…})

in particular, just

maximum(model.velocities.u)

works properly. However, weirdly enough, also

sum(abs, model.velocities.u)
prod(abs, model.velocities.u)

work properly.

simone-silvestri · 2025-01-10T08:47:45Z

Actually this is easily explainable because sum and prod use different initialization for reductions than maximum and minimum

Oceananigans.jl/src/Fields/field.jl

Lines 611 to 616 in 19baa39

    
           initialize_reduced_field!(::SumReduction,     f, r::ReducedAbstractField, c) = Base.initarray!(interior(r), f, Base.add_sum, true, interior(c)) 
        
           initialize_reduced_field!(::ProdReduction,    f, r::ReducedAbstractField, c) = Base.initarray!(interior(r), f, Base.mul_prod, true, interior(c)) 
        
           initialize_reduced_field!(::AllReduction,     f, r::ReducedAbstractField, c) = Base.initarray!(interior(r), f, &, true, interior(c)) 
        
           initialize_reduced_field!(::AnyReduction,     f, r::ReducedAbstractField, c) = Base.initarray!(interior(r), f, |, true, interior(c)) 
        
           initialize_reduced_field!(::MaximumReduction, f, r::ReducedAbstractField, c) = Base.mapfirst!(f, interior(r), interior(c)) 
        
           initialize_reduced_field!(::MinimumReduction, f, r::ReducedAbstractField, c) = Base.mapfirst!(f, interior(r), interior(c))

simone-silvestri · 2025-01-10T08:56:02Z

Seems to be connected to the fact that Base.mapfirst!(f, interior(r), interior(c)) with c an AbstractOperation now wants to run on a CPU instead than on a GPU, another clue:

julia> v = model.velocities.u + 5

julia> maximum(v)
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

glwagner · 2025-01-10T09:09:19Z

See what ,ethod gets called with the previous CUDA and also with the updated one. interior(r) is SubArray of CuArray?

simone-silvestri · 2025-01-10T09:45:27Z

For documentation purposes, with @glwagner we found out that the issue is that the new GPUArrays dispatches map! onto AnyGPUArray also for the destination, defined here
https://github.com/JuliaGPU/GPUArrays.jl/blob/78c9ef079c28cfd4ecb7ce718da4c7490dc12404/src/host/broadcast.jl#L106-L130
while before the destination was an AbstractArray
https://github.com/JuliaGPU/GPUArrays.jl/blob/d0492a29bdf18b9346cba7da16d7bcaf007d7fda/src/host/broadcast.jl#L120-L130
A generic Oceananigans object is not an AnyGPUArray so the default method for abstract arrays which leads to scalar indexing gets called

function map!(f::F, dest::AbstractArray, A::AbstractArray) where F
    for (i,j) in zip(eachindex(dest),eachindex(A))
        val = f(@inbounds A[j])
        @inbounds dest[i] = val
    end
    return dest
end

We should pin an earlier CUDA version until this issue is solved

simone-silvestri · 2025-01-10T09:46:17Z

Seems to be connected to JuliaGPU/GPUArrays.jl#580

navidcy · 2025-01-13T20:34:29Z

Did we try pinning CUDA to v5.5?

liuchihl · 2025-01-15T17:23:37Z

I found pinning CUDA to v5.5.0 works fine after some tests.

simone-silvestri · 2025-01-15T17:42:16Z

The problem should be GPUArrays though, not CUDA itself, so pinning GPUArrays should be enough.

glwagner · 2025-01-15T17:49:36Z

Hm, we can close this right because its fixed on 0.95.6?

glwagner · 2025-01-15T17:49:48Z

Or should we keep it open until we find a more permanent solution than just compat

ali-ramadhan · 2025-01-17T04:47:56Z

I would vote to keep this issue open as upgrading CUDA.jl may solve other issues (e.g. #4047).

But also: Lagging further and further behind the latest changes in CUDA.jl could also introduce additional issues in the future.

navidcy · 2025-01-19T19:52:56Z

Does any of the changes introduced by CUDA v5.6.1 deal with this issue?

navidcy · 2025-01-19T19:53:36Z

Or should we keep it open until we find a more permanent solution than just compat

I think so.

simone-silvestri · 2025-01-19T19:55:48Z

I think, indeed, it has been resolved with JuliaGPU/GPUArrays.jl#584.
We should wait for the new release of GPUArrays to lift the pin.

glwagner mentioned this issue Jan 10, 2025

Regression mapping over over GPU array and unit range JuliaGPU/GPUArrays.jl#580

Open

This was referenced Jan 10, 2025

(0.95.6) Pin GPUArrays version + fix GPU testing pipeline #4037

Merged

Remove global allowscalar from testing suite attempt 2 #4039

Open

navidcy mentioned this issue Jan 13, 2025

New regional example for the ACC CliMA/ClimaOcean.jl#142

Draft

ali-ramadhan mentioned this issue Jan 17, 2025

"Number of threads per block exceeds kernel limit" when computing a reduction on a KernelFunctionOperation on some GPUs #4047

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalar indexing with CUDA 5.6.0 #4036

Scalar indexing with CUDA 5.6.0 #4036

simone-silvestri commented Jan 10, 2025

simone-silvestri commented Jan 10, 2025 •

edited

Loading

simone-silvestri commented Jan 10, 2025 •

edited

Loading

simone-silvestri commented Jan 10, 2025

glwagner commented Jan 10, 2025

simone-silvestri commented Jan 10, 2025 •

edited

Loading

simone-silvestri commented Jan 10, 2025

navidcy commented Jan 13, 2025

liuchihl commented Jan 15, 2025

simone-silvestri commented Jan 15, 2025

glwagner commented Jan 15, 2025

glwagner commented Jan 15, 2025

ali-ramadhan commented Jan 17, 2025

navidcy commented Jan 19, 2025

navidcy commented Jan 19, 2025

simone-silvestri commented Jan 19, 2025 •

edited

Loading

Scalar indexing with CUDA 5.6.0 #4036

Scalar indexing with CUDA 5.6.0 #4036

Comments

simone-silvestri commented Jan 10, 2025

simone-silvestri commented Jan 10, 2025 • edited Loading

simone-silvestri commented Jan 10, 2025 • edited Loading

simone-silvestri commented Jan 10, 2025

glwagner commented Jan 10, 2025

simone-silvestri commented Jan 10, 2025 • edited Loading

simone-silvestri commented Jan 10, 2025

navidcy commented Jan 13, 2025

liuchihl commented Jan 15, 2025

simone-silvestri commented Jan 15, 2025

glwagner commented Jan 15, 2025

glwagner commented Jan 15, 2025

ali-ramadhan commented Jan 17, 2025

navidcy commented Jan 19, 2025

navidcy commented Jan 19, 2025

simone-silvestri commented Jan 19, 2025 • edited Loading

simone-silvestri commented Jan 10, 2025 •

edited

Loading

simone-silvestri commented Jan 10, 2025 •

edited

Loading

simone-silvestri commented Jan 10, 2025 •

edited

Loading

simone-silvestri commented Jan 19, 2025 •

edited

Loading