-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI_Allreduce Segmentation fault Docker #5372
Comments
Beware that docker by default limits shared memory to 64MB. You can try increasing that size by |
Hi, I increased the --shm-size=1GB (I checked /dev/shm to be sure) and nothing changed. |
I see. Thanks for checking. Do you use any special cpu affinity binding? |
No, it is my docker container inspect. ` |
Could you try set environment variable |
If a set this variable, proccess fails with "Abort(566543): Fatal error in PMPI_Init: Other MPI error, **cvar_val MPIR_CVAR_DEVICE_COLLECTIVES 0" |
Which mpich release are you using? |
3.4.1 |
coredump trace: Program terminated with signal SIGSEGV, Segmentation fault. |
Could you try |
It seems to work, I'll do more testing to make sure. How MPIR_CVAR_DEVICE_COLLECTIVES affect to the mpich performance, i am testing multiple scientific applications and need the best posible results. |
Alright, so we identify the issue is only related to the |
Yes and it works, how can I help to find the error? |
Just trying to narrow down the issue, can you run with to disable topology-aware tree with release_gather? |
If I remove export MPIR_CVAR_DEVICE_COLLECTIVES=none and I use export MPIR_CVAR_ENABLE_INTRANODE_TOPOLOGY_AWARE_TREES=0. The program fails again with the same error. |
Just saw the stack trace above. It looks like the following call may fail and so the MPL_atomic_release_store_uint64 was called with invalid pointer. It may related to what @hzhou mentioned about the limit?
|
I increased the limit with --shm-size=1GB and the problem was the same. |
Did you build MPICH from source? |
Yes, "./configure --with-device=ch4:ofi --with-libfabric=embedded --enable-g=dbg,log --enable-thread-cs=per-vci --with-ch4-max-vcis=${MPICH_THREADS}" |
@cesarpomar Could you print the
So we can confirm whether it is shared memory allocation issue? |
Also it may help to print the value of "flags_shm_size". This is the size MPICH tried to allocate, it does increase as number of ranks increases. |
mpi_errno is printed in the log
The ouput of MPIR_ERR_SET is: "Error created: last=0000000000 class=0x0000000f MPIDI_POSIX_mpi_release_gather_comm_init(387) **fail" so if mpi_errno_ret is last, mpi_errno_ret is 0000000000. The problem must be mapfail_flag, shm_alloc put mapfail_flag=true even if there are no errors in the allocation, such as:
|
I think it points to the shared memory allocation failures. How many ranks did you try to run before the segfault happen? |
only works with 2 ranks |
I have narrow the problem to a base case. If I launch all processes in the same Docker Container, no problem apared. If I launch the processes spread over two containers fails. Normal functions like send, recv, gather, scatter always work, but MPI_Allreduce fails with shm problem. May be Mpich try to use shared memory betwen the proccesses in diferent container. I tried to launch the containers in diferents hosts but It failed again. |
Looking MPICH source code and using the previous trace, I think that the error is in MPID_Allreduce(/src/mpid/ch4/src/ch4_coll.h) that should call MPIR_Allreduce_impl but call MPIDI_Allreduce_intra_composition_gamma. When used MPIR_CVAR_DEVICE_COLLECTIVES=none, MPICH call MPIR_Allreduce_impl function in MPIR_Allreduce(src/mpi/coll/allreduce/allreduce.c) so it works. |
Now I see what you are doing. Apparently, you don't want to use shared memory in this case. I guess processes launched in different containers are in different namespace and they can't access the same shared memory anyway. So set MPIR_CVAR_DEVICE_COLLECTIVES=none probably is what you want |
Does your app create/free a lot of communicators? There was a known issue with leaking release_gather resources that was fixed in #4864. If you update to MPICH 3.4.2, it has the fix included. |
Processes from different MPI_COMM_WORLD cannot use shared memory (due to missing collective initialization). @cesarpomar I opened a separate issue #5376 tracking it. Please let me know if it is ok to close this issue. |
@cesarpomar Just to confirm that this is not particular to docker, if you try do the same outside docker -- on a single node launch processes separately and connect with open/accept/connect, does it result in the same issue? |
yes, same issue. I will try MPICH 3.4.2 |
After upgrading to MPICH 3.4.2, nothing changes. If I remove "MPIR_CVAR_DEVICE_COLLECTIVES = none" processes crash. |
@cesarpomar How many minimum processes are needed to reproduce the issue? |
@cesarpomar I still need some details to reproduce the failures. Do you think you can have a minimal reproducer (ideally outside docker)? |
Sorry, |
My Source code in C++:
Process 1:
Process 2:
Process 3:
comm is the result Intracomm with size=3. If you execute a Gather, Bcast work but a AllReduce crash. |
Thanks for your code. We'll look into it. |
@cesarpomar Because you are using the same |
This code is part of my phd thesis framework where multiple executors run Big Data codes using MPI inside Docker containers. Executors are synchronized with RPC call from a master, so the group is created step by step and when P1 and P2 return the RPC call, a new call is made with the three processes. |
ok |
I just tested with 3 processes on two nodes. Either process 0 or 1 or 2 located on a separate node all works fine. I used slightly different testing code:
Launch on first node with
launch on 2nd node with
Adjust the skip process list to test different scenarios. They all seem run fine with my tests. Which mpich version were you testing with? |
Could it be the way the group is created? your way is simpler and cleaner than mine. In my example code, I use three function to create the final intercomunicador (Merge, Create Intercom, Merge) and you only use MPI_Intercomm_merge. Moreover, I don’t call accept on process in the group, only process 0 call accept and the new process connect. it could be that my implementation creates communicators that give problems with Allreduce and AllGather functions. Although my implementation works with the other functions, Could this be the problem? |
Once you have the intra-communicator, it should work the same, I believe. I was more worried about the interference during your connections. Can you confirm that they are not interfering? But before we do the guessing game, can you try my example and confirm it is working (or not)? |
ok, i'm testing it |
I just tested your code and works. After compare with my code, I found a way that the code fails. After compare it with my code, I found a way that your code fails, only a MPI_Allreduce works but if we add more the error appears.
tc.c:80 is the last MPI_Allreduce.
MPIDI_POSIX_mpi_release_gather_comm_init is the same problematic function. |
Yes, I have reproduced the bug. |
@cesarpomar Could you try apply the patch in #5440 to see if it fixes the issue? |
Yes. All works perfectly. No bugs, no segfault. |
I try to run the application https://github.com/LLNL/LULESH inside a single Ubuntu 20.04 Docker Container. When I increase the number of MPI process, the process abort with Segmentation fault. The core dump trace say that the error is the function MPI_Allreduce (lulesh.cc). I recompile mpich with “--enable-g=dbg,log” and the error is in:
.........
0 0 7f04a38bf700[35] 2 2.827156 src/mpid/ch4/netmod/ofi/ofi_progress.c 86 Leaving MPID_STATE_MPIDI_OFI_PROGRESS
0 0 7f04a38bf700[35] 1 2.827159 src/mpid/ch4/shm/src/shm_init.c 69 Entering MPID_STATE_MPIDI_SHM_PROGRESS
0 0 7f04a38bf700[35] 1 2.827162 src/mpid/ch4/shm/posix/posix_progress.c 167 Entering MPID_STATE_MPIDI_POSIX_PROGRESS
0 0 7f04a38bf700[35] 1 2.827165 src/mpid/ch4/shm/posix/posix_progress.c 42 Entering MPID_STATE_PROGRESS_RECV
0 0 7f04a38bf700[35] 1 2.827168 src/mpid/ch4/shm/posix/eager/iqueue/iqueue_recv.h 20 Entering MPID_STATE_MPIDI_POSIX_EAGER_RECV_BEGIN
0 0 7f04a38bf700[35] 1 2.827170 ./src/mpid/common/genq/mpidu_genq_shmem_queue.h 224 Entering MPID_STATE_MPIDU_GENQ_SHMEM_QUEUE_INIT
0 0 7f04a38bf700[35] 2 2.827173 ./src/mpid/common/genq/mpidu_genq_shmem_queue.h 239 Leaving MPID_STATE_MPIDU_GENQ_SHMEM_QUEUE_INIT
0 0 7f04a38bf700[35] 2 2.827176 src/mpid/ch4/shm/posix/eager/iqueue/iqueue_recv.h 44 Leaving MPID_STATE_MPIDI_POSIX_EAGER_RECV_BEGIN
0 0 7f04a38bf700[35] 2 2.827179 src/mpid/ch4/shm/posix/posix_progress.c 113 Leaving MPID_STATE_PROGRESS_RECV
0 0 7f04a38bf700[35] 1 2.827182 src/mpid/ch4/shm/posix/posix_progress.c 126 Entering MPID_STATE_PROGRESS_SEND
0 0 7f04a38bf700[35] 2 2.827185 src/mpid/ch4/shm/posix/posix_progress.c 160 Leaving MPID_STATE_PROGRESS_SEND
0 0 7f04a38bf700[35] 2 2.827188 src/mpid/ch4/shm/posix/posix_progress.c 178 Leaving MPID_STATE_MPIDI_POSIX_PROGRESS
0 0 7f04a38bf700[35] 2 2.827190 src/mpid/ch4/shm/src/shm_init.c 75 Leaving MPID_STATE_MPIDI_SHM_PROGRESS
0 0 7f04a38bf700[35] 2 2.827198 src/mpid/ch4/src/ch4_progress.c 129 Leaving MPID_STATE_PROGRESS_TEST
0 0 7f04a38bf700[35] 2 2.827209 src/mpid/ch4/src/ch4_progress.c 237 Leaving MPID_STATE_MPID_PROGRESS_WAIT
0 0 7f04a38bf700[35] 256 2.827212 src/mpi/coll/helper_fns.c 73 OUT: errflag = 0
0 0 7f04a38bf700[35] 2 2.827215 src/mpi/coll/helper_fns.c 74 Leaving MPID_STATE_MPIC_WAIT
0 0 7f04a38bf700[35] 1 2.827218 ./src/mpid/ch4/include/mpidpost.h 28 Entering MPID_STATE_MPID_REQUEST_FREE_HOOK
0 0 7f04a38bf700[35] 2 2.827221 ./src/mpid/ch4/include/mpidpost.h 39 Leaving MPID_STATE_MPID_REQUEST_FREE_HOOK
0 0 7f04a38bf700[35] 65536 2.827224 ./src/include/mpir_request.h 437 freeing request, handle=0xac000005
0 0 7f04a38bf700[35] 1 2.827227 ./src/mpid/ch4/include/mpidpost.h 46 Entering MPID_STATE_MPID_REQUEST_DESTROY_HOOK
0 0 7f04a38bf700[35] 2 2.827230 ./src/mpid/ch4/include/mpidpost.h 48 Leaving MPID_STATE_MPID_REQUEST_DESTROY_HOOK
0 0 7f04a38bf700[35] 2048 2.827233 ./src/include/mpir_handlemem.h 347 Freeing object ptr 0x7f04a6a52a48 (0xac000005 kind=REQUEST) refcount=0
0 0 7f04a38bf700[35] 256 2.827236 src/mpi/coll/helper_fns.c 351 OUT: errflag = 0
0 0 7f04a38bf700[35] 2 2.827239 src/mpi/coll/helper_fns.c 353 Leaving MPID_STATE_MPIC_SENDRECV
0 0 7f04a38bf700[35] 16384 2.827254 src/mpi/errhan/errutil.c 854 Error created: last=0000000000 class=0x0000000f MPIDI_POSIX_mpi_release_gather_comm_init(387) **fail
0 0 7f04a38bf700[35] 16384 2.827265 src/mpi/errhan/errutil.c 1038 New ErrorRing[126]
0 0 7f04a38bf700[35] 16384 2.827268 src/mpi/errhan/errutil.c 1040 id = 0x0000a50f
0 0 7f04a38bf700[35] 16384 2.827272 src/mpi/errhan/errutil.c 1042 prev_error = 0000000000
0 0 7f04a38bf700[35] 16384 2.827275 src/mpi/errhan/errutil.c 1045 user=0
For some reason, the shared memory allocation crash the application,if i disable it, the problem disappear, could there be any incompatibility with Docker? I need the shared memory to performance tests. Some Idea?
The text was updated successfully, but these errors were encountered: