Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI_TYPE_INDEXED + MPI_SEND/RECV slow with older infiniband network? #12209

Open
chhu opened this issue Jan 3, 2024 · 4 comments
Open

MPI_TYPE_INDEXED + MPI_SEND/RECV slow with older infiniband network? #12209

chhu opened this issue Jan 3, 2024 · 4 comments

Comments

@chhu
Copy link

chhu commented Jan 3, 2024

Related to #12202 but without CUDA. On our shared-memory system (2xEPYC) MPI_TYPE_INDEXED works fast as expected, but as soon as our 40GBit Infiniband gets involved performance breaks down by a factor of 2-5. This does not happen with the same OMPI and linear buffers (arrays).

Speed and response time of IB is very high and working fine as expected.

I do not see this behavior on our big HPC system that has 100G IB, even with the same OMPI. Is there something I can tune? How does OMPI transmit indexed types? Single request per block or scatter/gather into linear array first?

Thanks!

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

Tested on 3.1, 4.1 and 5.1 latest

Please describe the system on which you are running

See #12202

@brminich
Copy link
Member

Is performance impact of using MPI_TYPE_INDEXED on 100G IB HPC system negligible or just smaller than on 40G systems?
I'd expect it to be noticable on any system, as UCX does not use certain protocols when data is not contigious.

@chhu
Copy link
Author

chhu commented Jan 25, 2024

Only thing I can say is that on 100G IB the TYPE_INDEX has no notable impact, while on the 40G it has a major impact. Are you suggesting one should avoid non-contiguous data exchange?

@brminich
Copy link
Member

yes, using non-contigious data may imply some limitations on mpi/ucx/network protocols

@chhu
Copy link
Author

chhu commented Jan 26, 2024

Hmm, maybe it would be a nice feature to linearize into a new buffer first before the exchange? Maybe let the user control this via a threshold setting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants