-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Teuchos: Teuchos::send/receive not handling a large message #5082
Comments
(1) is relevant because #1183 was closed a few weeks ago. Before that, it was possible to disable Teuchos' support for @seheracer , if you are able to reconfigure and rebuild Trilinos, could you try setting Note that MPI's interface itself takes message sizes as (32-bit) integers, so you still will not be able to use @trilinos/tpetra might want to be reminded of this. |
I am dealing with really large data, so I may try splitting the messages having more than 2^{31}-1 items into smaller messages. Thanks for the warning @mhoemmen. I think Teuchos might be having this issue only when the total size of the message is more than 2^{31}-1 bytes. Maybe I should split any message longer than 2^{31}-1 bytes into smaller ones to bypass this issue? or I will just use MPI's functions and deal with changing the types if I want more data types. Any recommendation is welcome. |
I think currently using To address this in Teuchos, we could take advantage of the templated ordinal type in @mhoemmen, what do you think about that idea? |
@bartlettroscoe I think I get what's going on.
The correct work-around, for now, is to use @bartlettroscoe wrote:
We could do that perfectly well with |
@seheracer wrote:
That's correct. That's really a problem with MPI's interface. The
Either approach should work. More importantly, use |
@mhoemmen and @bartlettroscoe, Thanks for the comments and the warnings. From now on, I will always use int-templated For the very large messages, I guess I have to split them into multiple sub-messages in any case regardless of using MPI or Teuchos. |
@seheracer wrote:
That is correct. I think newer versions of the MPI standard (> 3) may fix this. You may also try libraries like Jeff Hammond's BigMPI: https://github.com/jeffhammond/BigMPI |
Bug Report
@trilinos/teuchos
Description
Error when communicating a very long message (consisting of 400 million long long values) using Teuchos::send and Teuchos::receive. When Teuchos::send and Teuchos::receive are replaced by MPI_Send and MPI_Recv, the respective message is successfully communicated with a warning.
Steps to Reproduce
The code to reproduce the bug: (2 MPI ranks)
The output:
When the Teuchos::send/receive calls are replaced by MPI_send/recv (see the lines commented out in the code), the output is:
Notes
mpicc: icc (ICC) 18.0.1 20171018
mpirun: mpirun (Open MPI) 2.1.2
An issue on the warning when MPI_Send/Recv is used: open-mpi/ompi#4829.
The text was updated successfully, but these errors were encountered: