-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build overhaul2 #359
Build overhaul2 #359
Conversation
Please review @EricMier and/or @pdamme . I'll assign the PR numbers as issue numbers in the commit message when I merge it in. Issues and pull requests seem to have a common counter in github, so this should be fine. If I did miss anything in the commit message for #328 please mention it here @EricMier. While testing this in a container I noticed that libtinfo.so (provided by ncurses) is now required. Maybe we should find out what pulled in this new dependency before we add it to the documentation (can't be the changes in this PR. Maybe the file readers?) |
@corepointer I'll have a look at your changes today. Please wait before you merge it in. |
Awesome, thx :) |
Hi @corepointer, |
Yes please do! 👍 I don't know why you would get this library not found error. That's what I found out about it:
So that library should be present on all systems that have the required dependencies for Daphne installed. |
Ok thats odd. I build Daphne a dozen times before on this system. I will double check on another system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with a few comments :) I also tested the changes on another system with no errors. 👍
Just the test inside llvm of the .git should stay a file check (since its a file for submodules).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for these additional improvements of the build workflow, @corepointer. It looks very good to me, and the new features are quite useful.
Required before we merge it:
.gitignore
: A line!thirdparty/patches/
should be added, otherwise the patches will be excluded.- The test cases have become extremely slow on my system. They used to finish in about 20s. Now they seem to take forever.
./test.sh -d yes
(which prints the durations of the individual test cases) reveals that very many test cases are super slow now. E.g., the algoritms (among the first test cases) like kmeans etc. take around 2s now, while they used to take tens of ms. I can still get the good old runtimes when switching back to main. Can you reproduce this problem?
Optional, if they make sense:
- Shouldn’t
--clean
also clear the install dir? - Shouldn’t
--cleanAll
also clear the download cache? - Update the doc in
doc/development/BuildingDaphne.md
which @EricMier has already written. - Building MLIR: the install target (
cmake --build "$buildPrefix/$llvmName" --target install
), which you added seems to compile additional things, which we didn’t require before, and which takes a significant amount of time (it even builds executables). Is that intended? Can we omit this? - Patches: Will we always need them or do we need to re-evaluate them with version upgrades of the dependencies? Would be good to leave some comments in the build script.
Thank you for your feedback! |
d290e9e
to
58f1b9b
Compare
This commit adds several improvements to the build.sh script: * indicator files for download/install dependencies * improved clean/cleanAll parameters with and without interactive remove * colored output * documentation about the build script Closes #328
* quicker build * quicker downloads (git cloning less deps) * patches against warnings and compile failures * central build and install dirs Closes #359
Include issues and a name clash of USE_CUDA caused troubles. Closes #380
58f1b9b
to
e197f6f
Compare
I addressed all but one issues with this PR: fixed:
still open:The increase in run time for the test cases remains a mystery to me. Afaict it concerns the script based test cases. I tested a few things like omitting the patches, not compiling abseil separately and not using the llvm install target. I also ran on a few systems. new:Not an issue but another (convenience) feature: To quickly change build/cache/source/install prefixes to another directory, I introduced another intermediate variable. With all the creativity I could muster I called it myPrefix (but left a comment on what it does). If that bothers anybody I'm open for suggestions ;-) Sorry for the force-pushing. I rebased to latest on main and already started to squash together some properly formatted commits. You can avoid conflicts in your local version of this branch by either removing it or if you're on that branch do git fetch and git reset --hard origin/build_overhaul2 (essentially separating fetch+merge that a normal pull would do and replacing the latter with a reset). This will remove local modifications so if you have any stash or move commits to a temp branch first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @corepointer. Your recent changes look good to me and it builds successfully. I didn't test carefully, though, since that takes some time and we must anyway still fix the problem with the long-running test cases. I wouldn't merge it in before that is fixed, because it makes development hard.
I tried to find out why the test cases take so much time now.
- The problem does not exist on the
build_overhaul
branch, so it must be something added inbuild_overhaul2
. - By inserting a few prints, I found out that a perceivable delay comes from
daphne.cpp
line 283:and within that fromauto engine = executor.createExecutionEngine(moduleOp);
DaphneIrExecutor.cpp
line 180ff:When leavingauto maybeEngine = mlir::ExecutionEngine::create( module, nullptr, optPipeline, llvm::CodeGenOpt::Level::Default, sharedLibRefs, true, true, true);
sharedLibRefs
empty (i.e., nobuild/src/runtime/local/kernels/libAllKernels.so
), it (a) obviously crashes later because this lib is required, but (b) is fast again. Maybe the dynamic linking oflibAllKernels.so
is what takes so long. It seems like the only difference from themain
branch is the way OpenBLAS/LAPACK is used. I tried building OpenBLAS the old way (usingmake
), but no success there... But maybe this information is useful to you.
Fixed the test time issue. Changing back from the cmake openblas build to make did it. For testers: make sure to run build.sh --cleanAll first. |
Great! I can confirm that the test cases are back to normal execution time again. I will test a few scenarios (which always takes some time in the background) and report back in the course of the day. |
- Deleting also the DAPHNE build directory on --clean and --cleanAll (not only the dependencies). - Ensuring that `./build.sh --clean && ./build.sh` (build after clean) works: - catch2 install-success token is now also deleted on --clean (not only on --cleanAll). - Copying the ANTLR JAR from the download cache to the install dir is guarded by the install-success token (not by the download-success token) - Fixed a typo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, it's ready to be merged from my point of view.
While testing it, I still encountered the following minor issues, which I quickly fixed myself:
- The DAPHNE build directory was not deleted on
--clean
/--cleanAll
. I added that. - Build after clean (
./build.sh --clean && ./build.sh
) failed. I fixed it by:- deleting the catch2 install-success token on
--clean
(otherwisecatch2.hpp
is missing in the second build) - guarding the copying of the ANTLR JAR from the download cache to the install directory by the install-success token (not the download-success token) (otherwise, the JAR is missing in the second build)
- deleting the catch2 install-success token on
Furthermore, I slightly updated and corrected BuildingDaphne.md
.
Feel free to squash my additions in any way you like :) .
I happily noticed that the updated build-workflow successfully completes on build/thirdparty directories created by old build-workflow. This means, no manual action is required after pulling these changes once they're on main. (Nevertheless, the thirdparty directory will contain outdated directories of the individual dependencies, which could be confusing, but that's not a real issue.)
Awesome 👍 I'll bluntly go ahead and merge it in squashing your changes to a separate commit, to preserve some credit ;-) @pdamme |
Great that we finally got this merged in, it's a nice improvement. Thanks again @EricMier and @corepointer for pushing this forward! |
This PR contains the changes from the first build overhaul #328 PR (already squashed) plus some changes I had queued up for some time now.