-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI_Transpose Worker-to-Worker communication failing #278
Comments
Wow, that's been around since 2015: @amitmurthy any idea what that should be? |
I wonder why this wasn't caught in tests. @rohanmclure what did you do to call it? |
I believe this was invoked when worker processes attempted to |
Essentially it is an issue for when worker processes message one another. I should note that I would have generated this minimal example with RemoteChannels, however the commented code appears to result in infinite recursion. using Test
using MPI, Distributed
mgr = MPI.start_main_loop(MPI.MPI_TRANSPORT_ALL)
comm = MPI.COMM_WORLD
rank = MPI.Comm_rank(comm)
size = MPI.Comm_size(comm)
# Generating RemoteChannels on other workers also results in a crash
# c2, c3 = RemoteChannel(() -> Channel{Int}(0), 2), RemoteChannel(() -> Channel{Int}(0), 3)
@assert nprocs() >= 3
@fetchfrom 2 global c2 = Channel{Int}(0)
@fetchfrom 3 global c3 = Channel{Int}(0)
b1 = remotecall(2) do
l = 2
@sync @spawnat 3 begin
put!(c3, l)
end
return take!(c2) == 3
end
b2 = remotecall(3) do
correct = take!(c3) == 2
l = 3
@sync @spawnat 2 begin
put!(c2, l)
end
return correct
end
@test fetch(b1)
@test fetch(b2)
MPI.stop_main_loop(mgr) To run this script, please run |
Fixed by #293 |
Use of MPI_Transpose_All under Julia v1.0.1 on an HPC cluster arrives at line 285 of
cman.jl
which runsget
with a single integer parameter and so crashes.MPI.jl/src/cman.jl
Line 285 in 50d5916
The text was updated successfully, but these errors were encountered: