-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change defaults to prefer external libevent/hwloc #5395
Conversation
The IBM CI (GNU Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/934424b4952d6c1d149e2c11c26b9df9 |
The IBM CI (XL Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/952001875ca2f1598321dd7b41edcb0d |
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/fe5652436261f3f0374ac153fc7a82a4 |
@jsquyres i fixed this PR so it does not abort if |
7d0aaea
to
27e112a
Compare
@jsquyres as far as I am concerned, these commits can be squashed and the PR can be merged. |
@jsquyres can you please review this PR ? If you cannot do that before the merge, may I suggest you squash and merge these commits, and we will fix/clean this later if needed ? |
Yep -- am doing so now. Sorry for the giant delay. We'll get this in before the v4.0.x branch. |
27e112a
to
bcebb07
Compare
@ggouaillardet I squashed, rebased, and pushed. Passes tests for me, and I'm heading offline. If it passes tests for you and CI, can you merge? Thanks for the major assist! |
Hmm. Actually, this m4 doesn't implement all the logic described in #5031. We argued over that logic long and hard and finally concluded on the outline that is given in #5031. I'm not sure we want to merge this yet -- yes, it does the first step of preferring external, but it doesn't do any of the version checking, the fallback behavior, ...etc. |
bcebb07
to
331fd6c
Compare
@jsquyres I added some extra tests so by default, we do not use the external component version is less than the one of the internal version. |
:bot:mellanox:retest |
bot:mellanox:retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this still needs some work.
opal/mca/hwloc/external/configure.m4
Outdated
[AC_MSG_RESULT([no]) | ||
AC_MSG_WARN([external hwloc version is less than internal version (2.0)]) | ||
AC_MSG_WARN([using internal hwloc]) | ||
opal_hwloc_external_support=no])]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on why all this code moved up to the previous block. The code was previously split into two distinct parts:
- Do we have external hwloc at all.
- If we have it, does that external hwloc have all the things we need?
This is not a deal breaker -- it's just odd that you moved some of the "does external hwloc have the things we need?" (i.e., the version check) but left other things in the lower block (i.e., the XML check).
opal/mca/hwloc/external/configure.m4
Outdated
[AC_LANG_PROGRAM([[#include <hwloc.h>]], | ||
[[ | ||
#if HWLOC_API_VERSION < 0x00020000 | ||
#error "hwloc API version is less than 0x00020000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to have this version by reference instead of by value? I.e., I imagine that this will be one more place to forget to update if/when we update the embedded version of hwloc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know ... strictly speaking, this is not even the library version (e.g. 2.0.1) but the API version (e.g. 2.0.0).
@bgoglin is there any way to test the __library__version instead of the API version at compile time ?
for example, libevent
defines _EVENT_NUMERIC_VERSION
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggouaillardet Don't you also need to test that $with_hwloc
wasn't given without any path (and so has a yes
value)? I thought that test -z
just means it wasn't given at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhc54 you are right, and I just pushed a commit that fixes that.
[AC_MSG_RESULT([yes])], | ||
[AC_MSG_RESULT([no]) | ||
AC_MSG_ERROR([Cannot continue])]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per #5031, if external support is not available, we need to emit a message at the bottom of this configure.m4:
- If the user explicitly asked for external support, print why the external hwloc is not suitable and then abort.
- If the user did not explicitly ask for external support, print why the external hwloc is not suitable and then allow processing to continue (which will end up selecting the internal hwloc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixed in the latest commit
AC_MSG_ERROR([Cannot continue])])])])], | ||
[AC_MSG_RESULT([(default search paths)])]) | ||
AS_IF([test ! -z "$with_libevent_libdir" && test "$with_libevent_libdir" != "yes"], | ||
[opal_event_libdir="$with_libevent_libdir"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't all of the above be similar to hwloc? I.e., shouldn't the external/configure.m4
between hwloc and libevent be pretty similar in structure? The exact tests will be different, of course (for various libraries, functions, header files, versions, ...etc.), but the structure should be similar -- especially when checking for lib vs. lib64, external-vs-non-external, ...etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at first glance, my answer is yes.
But this is a revamp that is not in the scope of this PR.
Should hwloc
or libevent
be the reference ?
opal/mca/event/external/configure.m4
Outdated
[[ | ||
#if _EVENT_NUMERIC_VERSION < 0x02001500 | ||
#error "libevent API version is less than 0x02001500" | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs another version check to ensure it is >= the internal version of libevent (just like hwloc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the purpose of this test is to ensure this.
(and unlike in hwloc
we do not explicitly require a minimal external version for libevent
)
bot:mellanox:retest |
Looks good to me - @jsquyres is on vacation. I'd go ahead and merge it - the revamp is definitely outside scope and IMO not worth the trouble. |
@ggouaillardet (replying to a comment that might have been lost between patch updates) There's hwloc_get_api_version() which returns the HWLOC_API_VERSION built inside the lib. |
Signed-off-by: Gilles Gouaillardet <[email protected]> Signed-off-by: Jeff Squyres <[email protected]>
Switch from #-style to dnl-style. Signed-off-by: Jeff Squyres <[email protected]>
Signed-off-by: Jeff Squyres <[email protected]>
6b828bd
to
a70ecf5
Compare
@bgoglin I want to test the hwloc version at configure time, so the My main concern is For example, Open MPI |
We've just hit some build issues which I think are probably the result of this pull request, our environment is that we want to test pmix so are using upstream pmix along with ompi and therefore by necessity are also using external libevent, but I think we're using the system provided libevent rather than a specific version or build. Our configure like is this: ./configure --with-platform=optimized --enable-orterun-prefix-by-default --prefix=/testbin/ompi --with-pmix=/testbin/pmix --disable-mpi-fortran --enable-contrib-no-build=vt --with-libevent=external --with-hwloc=/testbin/hwloc And the build is then failing shortly after with this: 13:58:23 --- MCA component pmix:ext2x (m4 configuration macro) Please let me know if I should file this as a ticket directly. |
Thanks for the report and sorry for the trouble. I will have a look at it, this is something that will be hopefully quick to fix. |
Fix is in the oven |
This is an internal commit I made on the way towards #5031. It is very much a WIP kind of thing; I honestly don't even remember what works / what doesn't work / what still needs to be done.
I'm pushing it to github in case someone else has time to advance this work.