You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Related to #12202 but without CUDA. On our shared-memory system (2xEPYC) MPI_TYPE_INDEXED works fast as expected, but as soon as our 40GBit Infiniband gets involved performance breaks down by a factor of 2-5. This does not happen with the same OMPI and linear buffers (arrays).
Speed and response time of IB is very high and working fine as expected.
I do not see this behavior on our big HPC system that has 100G IB, even with the same OMPI. Is there something I can tune? How does OMPI transmit indexed types? Single request per block or scatter/gather into linear array first?
Thanks!
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
Tested on 3.1, 4.1 and 5.1 latest
Please describe the system on which you are running
Is performance impact of using MPI_TYPE_INDEXED on 100G IB HPC system negligible or just smaller than on 40G systems?
I'd expect it to be noticable on any system, as UCX does not use certain protocols when data is not contigious.
Only thing I can say is that on 100G IB the TYPE_INDEX has no notable impact, while on the 40G it has a major impact. Are you suggesting one should avoid non-contiguous data exchange?
Hmm, maybe it would be a nice feature to linearize into a new buffer first before the exchange? Maybe let the user control this via a threshold setting?
Related to #12202 but without CUDA. On our shared-memory system (2xEPYC) MPI_TYPE_INDEXED works fast as expected, but as soon as our 40GBit Infiniband gets involved performance breaks down by a factor of 2-5. This does not happen with the same OMPI and linear buffers (arrays).
Speed and response time of IB is very high and working fine as expected.
I do not see this behavior on our big HPC system that has 100G IB, even with the same OMPI. Is there something I can tune? How does OMPI transmit indexed types? Single request per block or scatter/gather into linear array first?
Thanks!
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
Tested on 3.1, 4.1 and 5.1 latest
Please describe the system on which you are running
See #12202
The text was updated successfully, but these errors were encountered: