-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial symbol versioning #1955
Conversation
Test FAILed. |
Interesting - I'm not claiming to grok all of this, but one thing it definitely shows is that we are exposing a lot of internal symbols that shouldn't be exposed. I'd say we have a lot of cleanup to do! This is good timing as we have a developer's meeting next week, so I'll put this on the agenda. Thanks! |
Test FAILed. |
What is the motivation for this PR? |
Maintainability in distributions, specifically Debian. See the notes attached to dup issue #1956. As a library grows in usage and importance, the practice of "rebuild everything on upgrade of SO Version number" becomes untenable, so symbol versioning was invented. The best example of this is GNU glibc, which has been at version 6 and holding for decades now, despite ABI upgrades. Current practice basically requires a "recompile and test everything" approach, which in the case of a downstream distro like Debian / Fedora, etc. translates to "nothing works until everything works": moving from openmpi 1.10.3 to openmpi 2.0 (without sym, versioning) means removing all existing, working MPI software from the integration suite (distribution), re-including it over time as it builds and is tested with openmpi2. This breaks hundreds (soon thousands?) of packages for 4-8 weeks while everyone drops everything to upgrade and test against the new openmpi. It makes simultaneous development of those packages impossible. |
@amckinstry Thank you for putting together this patch! (and the supporting documentation -- wow!) This makes it much easier to discuss the issue / understand what is being proposed. A few notes:
|
I note two minor problems with this PR:
|
@jsquyres On (2), the primary benefit for us (Debian + Ubuntu in particular, distro maintainers in general) is that the SOVERSION on the MPI and OSHMEM libraries does not change. In moving from OpenMPI 2.x.y to 2.a.b the version number of the main libs remains 20.. . For internal libraries, open-rte, etc. this is not important, and I expect these will change over 2.x. This is important because otherwise integrating a new OpenMPI release requires rebuilding and testing all linked libraries and applications as a single piece. (An upgrade of OpenMPI that changes libmpi.so.20 -> libmpi.so.21 means that version20 drops from the integration suite, breaking all binaries that depend on it). This requires a freeze (typically weeks) of all non-related development on the dozens of affected packages, something we dearly wish to avoid. On the selection of symbols: this is in need of review. I started by whitelisting all MPI__, mpi__, ompi_* , PMPI_* as being assumed ok, then included others such as comm__, netpatterns__ namespaces as necessary. Some seem to be 'leaks': roundup_to_power_radix for example almost certainly shouldn't be public, but OpenMPI fails to compile (make check) unless its visible. As you note, the testing uses hidden internal symbols; maybe a refactor is needed. On C++ / Fortran names, the mangling scheme is compiler-dependent, while symbol versioning is done at the linker level, so I don't think its possible. It may be necessary to include the mangled versions for each compiler :-( I copied over by-hand the files from my local git to a github branch, so did I miss something? I do need to do ./autogen.pl to include auto* changes. |
@amckinstry 2 issues: Intent of this PRGotcha. I understand that the intent is that you can upgrade Open MPI in Debian/Ubuntu/etc. from 2.0.x to 2.a.b and not have to recompile all packages that are dependent upon it. This is definitely a highly desirable goal. Although I'll say: you should still test the apps that are dependent upon Open MPI when Open MPI is upgraded. 😄 (...after thinking about this for a minute...) I'm still not sure I understand -- can you clarify for me? The symbol versioning in this PR is orthogonal to the shared library version number, right? E.g., if we bump the Libtool c:r:a to 3:0:0, the symbol versioning in this PR won't save you from having to recompile / relink all packages that are dependent upon Open MPI, right? Auditing of symbols / make check / etc.I didn't look closely; I'll start a build of this PR and have a look. |
The key point is that on upgrade, the linking foo -> libmpi.so.20 should not break. While MPI / OpenMPI has a better upgrade path and record than most (and expectation that the API will not change incompatibly), note that if an "opaque" struct like MPI_Comm / MPI_Win, etc. were to change 'internally' , the ABI would break and the major version number would need to change, unless symbol versioning were used. So versioning is the "answer" to the problem rather than orthogonal. On testing, yes :-) we do regular re-compilation and test runs of the whole archive. |
bot:lanl:retest |
Test FAILed. |
@amckinstry We're actually together for a face-to-face Open MPI engineering meeting this week (https://github.com/open-mpi/ompi/wiki/Meeting-2016-08), and a bunch of us talked about this in detail yesterday. We think there are 3 issues getting conflated here:
That first two bullet points under 3 are probably subjective, and could be argued. E.g., past behavior from the MPI / OSHMEM standards bodies is not necessarily indicative of future behavior. But the last point -- that it interferes with the MPI profiling interface -- is probably the clincher, and the definitive reason why we can't use symbol versioning for the MPI / OSHMEM APIs. All this being said, let's bring @opoplawski into the conversation -- he's the Fedora packager for Open MPI. Orion: do you have any thoughts on this? |
Ok, backing up to (2), and leaving aside symbol versioning: I understand that the MPI and OSHMEM APIs are tightly regulated and won't change in 2.*, but are the ABIs? For the .so version number to remain constant, the internals of opaque data types that are passed also need to remain constant. Things like ompi_status_public_t make me assume no. |
You are correct -- the respective standards bodies do not govern the ABIs of the MPI and OSHMEM APIs. It's our job as a community to ensure that we maintain our backwards compatibility promises per our version numbering scheme. And just to be clear: the
(forgive me for being a bit pedantic here -- I just want to make sure that we're all 100% agreeing on precisely what we're talking about) |
Some comments:
|
@jsquyres thanks for being pedantic. I expect the soversion to change between major releases, eg 2.* -> 3.* . The libtool guidelines point out when it is necessary to increment the version number, the point of the symbol versioning was a technology to make it not necessary (as shown with glibc). ABI stability is not required by the MPI standard, as you point out; its a decision within the OpenMPI developers, hence my question. Are there agreed policies//expected changes to opaque objects over the 2.* timeframe? @opoplawski : Not quite true on opaque objects: if they change in size then pre-existing binaries that are expecting opaque objects will crash if a new object of a (e.g larger) size arrives on the stack (or if you were keeping a copy of the object locally in a user struct, etc.) This can be avoided by setting a policy that they won't change size, or adding padding into structs for future expansion). Tools are useful to check this, thanks for the pointers. On symbol versioning, I think you're confusing library versioning with symbol versioning. The aim is to ensure that binaries built against old library versions will continue to work when a new version is put in place, despite the behaviour of a method or size of a struct having changed, because multiple versions of a symbol (method) will be shipped in the same library. This enables eg. backports, where we can drop in a new version of a library on to a running system, providing new functionality / bugfixes without breaking existing binaries (including user binaries as well as stuff shipped by Debian/ Fedora, etc.). |
@jsquyres This matters for a potential transition (20 -> 21) on the timetable for Debian stretch (9.0). For this next release, the deadline for completing transitions is Nov 2016, so if there is a 2.1 release before then with symbol cleanup, it might not make it in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a Signed-off-by line to this PR's commit.
(we just voted this policy in today as a community, so I'm adding this request for changes to all pending PRs)
Build Failed with XL compiler! Please review the log, and get in touch if you have questions. Gist: https://gist.github.com/6e3fe3e4b03936cb4a2d1c479625e1ac |
The IBM CI (GNU Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/89876d57a2677efc2f124b3c0f47076f |
I believe this PR is stale - IBM contributed a test for non-hidden variables, and we have scrubbed the code base to the extent possible (we cannot hide everything as our plugins need to see core library symbols). So I'm going to close this one now - please reopen if you feel something more needs to be done. |
Attached is an initial patch implementing symbol versioning.