-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant degradation in message rates observed on Master. #1831
Comments
@jladd-mlnx You need to confirm this affects 2.0.0 before setting a blocker there. |
One of the biggest differences between 2.0.0 and master is master always has MPI_THREAD_MULTIPLE support. This will impact performance and we intend to fix as much of the performance regression as possible by the next master branch. |
We also need to confirm that the performance drop was not already visible before the request rework made it in. |
Yalla looks better on 2.x, but still degraded by ~5%. OpenIB still has performance issues, and a bug that prevents the test from completing. PML - Yalla
PML - OB1
|
Looks like the issue is the tcp btl is being used. Add |
@hjelmn I see that. That's odd. Violates the law of least surprise. Still some degradation.
|
Yeah, very odd that the tcp btl is active. Is this system similar to the one running Jenkins? I understand that one has a two port card with one ib and one ethernet port. By default both ports will be used for large messages. Not sure why it is affecting small ones. I put together a patch to adjust all the latency and bandwidth numbers for the btls that might help. |
BTW, ~5% is not a blocker. I have seen larger variation due to changes in icache miss rates :-/ |
Did we lost the exclusivity ? |
Law of least surprise is respected on Master. Same command line that triggered TCP BTL, loads OpenIB on master. The nightly build is from an internal nightly MLNX build. I'll try with master from Git when I get a chance. |
@bosilca The exclusivity check looks ok to me on master. Will check 2.x. |
v2.x looks ok too. We sort the btls be decreasing exclusivity. Then we add the proc to each btl in that order and only add them to btl_send if there is no send btl endpoint for the proc or the exclusivity is equal. |
If preventing the TCP BTL from being used improves the latency, then the exclusivity might not work the way we expect. |
Agreed. It should be preventing the tcp btl from running with that proc. Will have to run through the code to see why this could be happening. |
Tests with master head from GitHub. BIG degradation in Yalla. Significant degredation in OpenIB. OMPI - Master Head f18d660
Another observation - OpenIB message rates are quite erratic from run to run. This is just a single trial that came back particularly degraded
|
@hppritcha Please see this Issue for tracking and comments. |
I need more info on how to reproduce this problem. Was osu_mbw_mr used to generate the message rates for example? And were any particular PSM2 related env. variables set? |
Hi all, mpirun -np 2 -host host-1,host-2 ./osu_mbw_mr |
I collected some osu_mbw_mr numbers off of one of the LANL omnipath systems and put them It appears there was a significant performance degradation going from 1.10 release stream to master. It looks like 2.0.x release stream is the worst, then something was done on master to patch things up a bit, with PRs back to 2.x after the 2.0.x branch was created. @jsquyres might want to take a look. |
I got data using the GNI provider. It also shows a significant performance degration for shorter messages. https://gist.github.com/hppritcha/200bc7a2d4dfde709245d6ffa7b2b971 not as bad as for the PSM2 MTL though. |
I don't see the 1.10 data in that gist, but the 2.0.x data is clearly impacted relative to master and v2.x. So it looks like there is something in the code path above the libraries. @matcabral @hjelmn Can someone take a look and see if something is missing in the CM PML, or related code? |
pml/cm looks up to date to me. |
Given that the problem is in master as well, my comment was more to the point that perhaps some change is required in that code path - something that was done to resolve the performance issue in the pml/ob1 path, but didn't get done in the pml/cm path. |
@rhc54 I don't think @hppritcha plans to get the Cray code to work with v1.10. @rhc54 @matcabral Any progress on this issue? |
Still under investigation. I am removing the blocker tag from it as our folks are engaged in some other things right now, and this shouldn't hold up v2.0.2. @matcabral I spoke with @hjelmn about this at SC, and he suggested looking at the instruction cache for "misses". We have previously seen cases where slight changes to the code path, even when reducing instruction count, would result in performance degradation due to a sudden spike in cache misses. This could be what is happening here. |
How are we doing on this issue on the v2.x branch these days? Is this still an open issue, or should we close it? |
I'm not sure if this is fixed. I'm observing performance degradation on my test with builtin atomics. This is injection rate in msg/s from multithreaded benchmark.
|
Just to be clear: that was on v2.x, right? Your table implies that we should switch the default to use the no-builtin-atomics...? If we're seeing an 82% performance degradation, this feels like a blocker for v2.1.0. @hppritcha? |
|
Could you repeat the test on v2.x, please? |
@thananon what BTL are you using when you see the 82 per cent degradation? |
@jsquyres I will re run the test on 2.x as soon as I can. |
Per the OMPI webex this morning, we're deferring this to v2.1.1. |
Has this actually been fixed in 2.1.1? Otherwise I'd like to move this to future milestone. |
the problems I found were addressed in #3748 |
per last comment from @matcabral closing this issue |
Opening this issue for tracking purposes. Measured with Master nightly build against 1.10.3. Possible fix on master.
@hjelmn or @bosilca please comment.
The text was updated successfully, but these errors were encountered: