Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalar indexing with CUDA 5.6.0 #4036

Open
simone-silvestri opened this issue Jan 10, 2025 · 15 comments
Open

Scalar indexing with CUDA 5.6.0 #4036

simone-silvestri opened this issue Jan 10, 2025 · 15 comments

Comments

@simone-silvestri
Copy link
Collaborator

Reductions on Oceananigans seem to not work on GPUs using CUDA 5.6.0 which switched to GPUArrays 11.2.0.
It looks like scalar indexing errors are now popping up. Seems to be connected with the fact that GPUarrays switched to a KernelAbstraction backend. I ll investigate a bit.

MWE:

using Oceananigans
grid = RectilinearGrid(GPU(), size=(1, 1, 1), extent=(1, 1, 1))
model = NonhydrostaticModel(; grid)
maximum(abs, model.velocities.u)
@simone-silvestri
Copy link
Collaborator Author

simone-silvestri commented Jan 10, 2025

the issue seems to be

 [16] mapfirst!(f::typeof(identity), R::SubArray{…}, A::Oceananigans.AbstractOperations.ConditionalOperation{…})

in particular, just

maximum(model.velocities.u)

works properly. However, weirdly enough, also

sum(abs, model.velocities.u)
prod(abs, model.velocities.u)

work properly.

@simone-silvestri
Copy link
Collaborator Author

simone-silvestri commented Jan 10, 2025

Actually this is easily explainable because sum and prod use different initialization for reductions than maximum and minimum

initialize_reduced_field!(::SumReduction, f, r::ReducedAbstractField, c) = Base.initarray!(interior(r), f, Base.add_sum, true, interior(c))
initialize_reduced_field!(::ProdReduction, f, r::ReducedAbstractField, c) = Base.initarray!(interior(r), f, Base.mul_prod, true, interior(c))
initialize_reduced_field!(::AllReduction, f, r::ReducedAbstractField, c) = Base.initarray!(interior(r), f, &, true, interior(c))
initialize_reduced_field!(::AnyReduction, f, r::ReducedAbstractField, c) = Base.initarray!(interior(r), f, |, true, interior(c))
initialize_reduced_field!(::MaximumReduction, f, r::ReducedAbstractField, c) = Base.mapfirst!(f, interior(r), interior(c))
initialize_reduced_field!(::MinimumReduction, f, r::ReducedAbstractField, c) = Base.mapfirst!(f, interior(r), interior(c))

@simone-silvestri
Copy link
Collaborator Author

Seems to be connected to the fact that Base.mapfirst!(f, interior(r), interior(c)) with c an AbstractOperation now wants to run on a CPU instead than on a GPU, another clue:

julia> v = model.velocities.u + 5

julia> maximum(v)
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

@glwagner
Copy link
Member

See what ,ethod gets called with the previous CUDA and also with the updated one. interior(r) is SubArray of CuArray?

@simone-silvestri
Copy link
Collaborator Author

simone-silvestri commented Jan 10, 2025

For documentation purposes, with @glwagner we found out that the issue is that the new GPUArrays dispatches map! onto AnyGPUArray also for the destination, defined here
https://github.com/JuliaGPU/GPUArrays.jl/blob/78c9ef079c28cfd4ecb7ce718da4c7490dc12404/src/host/broadcast.jl#L106-L130
while before the destination was an AbstractArray
https://github.com/JuliaGPU/GPUArrays.jl/blob/d0492a29bdf18b9346cba7da16d7bcaf007d7fda/src/host/broadcast.jl#L120-L130
A generic Oceananigans object is not an AnyGPUArray so the default method for abstract arrays which leads to scalar indexing gets called

function map!(f::F, dest::AbstractArray, A::AbstractArray) where F
    for (i,j) in zip(eachindex(dest),eachindex(A))
        val = f(@inbounds A[j])
        @inbounds dest[i] = val
    end
    return dest
end

We should pin an earlier CUDA version until this issue is solved

@simone-silvestri
Copy link
Collaborator Author

Seems to be connected to JuliaGPU/GPUArrays.jl#580

@navidcy
Copy link
Collaborator

navidcy commented Jan 13, 2025

Did we try pinning CUDA to v5.5?

@liuchihl
Copy link
Contributor

I found pinning CUDA to v5.5.0 works fine after some tests.

@simone-silvestri
Copy link
Collaborator Author

The problem should be GPUArrays though, not CUDA itself, so pinning GPUArrays should be enough.

@glwagner
Copy link
Member

Hm, we can close this right because its fixed on 0.95.6?

@glwagner
Copy link
Member

Or should we keep it open until we find a more permanent solution than just compat

@ali-ramadhan
Copy link
Member

I would vote to keep this issue open as upgrading CUDA.jl may solve other issues (e.g. #4047).

But also: Lagging further and further behind the latest changes in CUDA.jl could also introduce additional issues in the future.

@navidcy
Copy link
Collaborator

navidcy commented Jan 19, 2025

Does any of the changes introduced by CUDA v5.6.1 deal with this issue?

@navidcy
Copy link
Collaborator

navidcy commented Jan 19, 2025

Or should we keep it open until we find a more permanent solution than just compat

I think so.

@simone-silvestri
Copy link
Collaborator Author

simone-silvestri commented Jan 19, 2025

I think, indeed, it has been resolved with JuliaGPU/GPUArrays.jl#584.
We should wait for the new release of GPUArrays to lift the pin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants