Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build overhaul2 #359

Closed
wants to merge 7 commits into from
Closed

Build overhaul2 #359

wants to merge 7 commits into from

Conversation

corepointer
Copy link
Collaborator

This PR contains the changes from the first build overhaul #328 PR (already squashed) plus some changes I had queued up for some time now.

  • quicker build (using more ninja)
  • quicker download (avoiding full repo checkout of some dependencies)
  • added a few patches that silence some warnings and fix some compilation issues
  • central build/install directory
  • script parameters to use --cuda or --debug without having to edit the script

@corepointer
Copy link
Collaborator Author

Please review @EricMier and/or @pdamme . I'll assign the PR numbers as issue numbers in the commit message when I merge it in. Issues and pull requests seem to have a common counter in github, so this should be fine. If I did miss anything in the commit message for #328 please mention it here @EricMier.

While testing this in a container I noticed that libtinfo.so (provided by ncurses) is now required. Maybe we should find out what pulled in this new dependency before we add it to the documentation (can't be the changes in this PR. Maybe the file readers?)

@corepointer corepointer marked this pull request as ready for review May 9, 2022 08:11
@corepointer
Copy link
Collaborator Author

@pdamme, @EricMier, if there are no further comments to this, I'll merge it in later.

@pdamme
Copy link
Collaborator

pdamme commented May 18, 2022

@corepointer I'll have a look at your changes today. Please wait before you merge it in.

@corepointer
Copy link
Collaborator Author

@corepointer I'll have a look at your changes today. Please wait before you merge it in.

Awesome, thx :)

@pdamme pdamme self-requested a review May 18, 2022 13:45
@EricMier
Copy link
Collaborator

Hi @corepointer,
I also wanted to take a look at your changes. But when building after cloning, I get following "error: unable to find library -lpfm". Is this the new dependency you mentioned?

@corepointer
Copy link
Collaborator Author

Hi @corepointer, I also wanted to take a look at your changes. But when building after cloning, I get following "error: unable to find library -lpfm". Is this the new dependency you mentioned?

Yes please do! 👍 I don't know why you would get this library not found error. That's what I found out about it:

dpkg -l | grep -i pfm
ii  libpfm4:amd64                                 4.11.1+git32-gd0b85fb-1                     amd64        Library to program the performance monitoring events

apt-cache rdepends --installed libpfm4
libpfm4
Reverse Depends:
  llvm-10

So that library should be present on all systems that have the required dependencies for Daphne installed.

@EricMier
Copy link
Collaborator

Ok thats odd. I build Daphne a dozen times before on this system. I will double check on another system.

Copy link
Collaborator

@EricMier EricMier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with a few comments :) I also tested the changes on another system with no errors. 👍
Just the test inside llvm of the .git should stay a file check (since its a file for submodules).

build.sh Show resolved Hide resolved
build.sh Show resolved Hide resolved
build.sh Outdated Show resolved Hide resolved
@EricMier EricMier mentioned this pull request May 18, 2022
Copy link
Collaborator

@pdamme pdamme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for these additional improvements of the build workflow, @corepointer. It looks very good to me, and the new features are quite useful.

Required before we merge it:

  • .gitignore: A line !thirdparty/patches/ should be added, otherwise the patches will be excluded.
  • The test cases have become extremely slow on my system. They used to finish in about 20s. Now they seem to take forever. ./test.sh -d yes (which prints the durations of the individual test cases) reveals that very many test cases are super slow now. E.g., the algoritms (among the first test cases) like kmeans etc. take around 2s now, while they used to take tens of ms. I can still get the good old runtimes when switching back to main. Can you reproduce this problem?

Optional, if they make sense:

  • Shouldn’t --clean also clear the install dir?
  • Shouldn’t --cleanAll also clear the download cache?
  • Update the doc in doc/development/BuildingDaphne.md which @EricMier has already written.
  • Building MLIR: the install target (cmake --build "$buildPrefix/$llvmName" --target install), which you added seems to compile additional things, which we didn’t require before, and which takes a significant amount of time (it even builds executables). Is that intended? Can we omit this?
  • Patches: Will we always need them or do we need to re-evaluate them with version upgrades of the dependencies? Would be good to leave some comments in the build script.

@corepointer
Copy link
Collaborator Author

Thank you for your feedback!
I can confirm the long test runs. I saw a lot of CPU wait while this was executing. Needs more investigation
The necessity of applying the patches needs to be reviewed when we change/upgrade dependencies of course.
As far as I know there is no install-mlir target (yet?), so this needs to be installed via the global install target [1].
I added the clean/cleanAll suggestion.
Extending documentation - thx for the reminder. Will do.

1

EricMier and others added 3 commits May 24, 2022 00:48
This commit adds several improvements to the build.sh script:
* indicator files for download/install dependencies
* improved clean/cleanAll parameters with and without interactive remove
* colored output
* documentation about the build script

Closes #328
* quicker build
* quicker downloads (git cloning less deps)
* patches against warnings and compile failures
* central build and install dirs

Closes #359
Include issues and a name clash of USE_CUDA caused troubles.

Closes #380
@corepointer
Copy link
Collaborator Author

I addressed all but one issues with this PR:

fixed:

  • Removed using install target on llvm compile and added the necessary flags to the daphne cmake command again. There were indeed ~700 extra files compiled (50sec/5.5min more compile time on vega/laptop). We can come back to this once there's a standalone install-mlir cmake target upstream (the issue is known (see link in the other comment) but this has not been addressed so far)
  • I updated the documentation
  • I incorporated @pdamme 's suggestion about the clean/cleanAll behavior
  • updated .gitignore
  • fixed a few other minor things that came up while testing

still open:

The increase in run time for the test cases remains a mystery to me. Afaict it concerns the script based test cases. I tested a few things like omitting the patches, not compiling abseil separately and not using the llvm install target. I also ran on a few systems.

new:

Not an issue but another (convenience) feature: To quickly change build/cache/source/install prefixes to another directory, I introduced another intermediate variable. With all the creativity I could muster I called it myPrefix (but left a comment on what it does). If that bothers anybody I'm open for suggestions ;-)

Sorry for the force-pushing. I rebased to latest on main and already started to squash together some properly formatted commits. You can avoid conflicts in your local version of this branch by either removing it or if you're on that branch do git fetch and git reset --hard origin/build_overhaul2 (essentially separating fetch+merge that a normal pull would do and replacing the latter with a reset). This will remove local modifications so if you have any stash or move commits to a temp branch first.

@corepointer corepointer requested review from pdamme and EricMier May 24, 2022 07:47
Copy link
Collaborator

@pdamme pdamme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @corepointer. Your recent changes look good to me and it builds successfully. I didn't test carefully, though, since that takes some time and we must anyway still fix the problem with the long-running test cases. I wouldn't merge it in before that is fixed, because it makes development hard.

I tried to find out why the test cases take so much time now.

  • The problem does not exist on the build_overhaul branch, so it must be something added in build_overhaul2.
  • By inserting a few prints, I found out that a perceivable delay comes from daphne.cpp line 283:
    auto engine = executor.createExecutionEngine(moduleOp);
    
    and within that from DaphneIrExecutor.cpp line 180ff:
    auto maybeEngine = mlir::ExecutionEngine::create(
        module, nullptr, optPipeline, llvm::CodeGenOpt::Level::Default,
        sharedLibRefs, true, true, true);
    
    When leaving sharedLibRefs empty (i.e., no build/src/runtime/local/kernels/libAllKernels.so), it (a) obviously crashes later because this lib is required, but (b) is fast again. Maybe the dynamic linking of libAllKernels.so is what takes so long. It seems like the only difference from the main branch is the way OpenBLAS/LAPACK is used. I tried building OpenBLAS the old way (using make), but no success there... But maybe this information is useful to you.

@corepointer
Copy link
Collaborator Author

Fixed the test time issue. Changing back from the cmake openblas build to make did it. For testers: make sure to run build.sh --cleanAll first.

@corepointer corepointer requested a review from pdamme May 24, 2022 21:54
@corepointer
Copy link
Collaborator Author

I'd squash and merge tomorrow if everything's ok, also closing #120 and #241 along the way =)

@pdamme
Copy link
Collaborator

pdamme commented May 25, 2022

Great! I can confirm that the test cases are back to normal execution time again. I will test a few scenarios (which always takes some time in the background) and report back in the course of the day.

- Deleting also the DAPHNE build directory on --clean and --cleanAll (not only the dependencies).
- Ensuring that `./build.sh --clean && ./build.sh` (build after clean) works:
  - catch2 install-success token is now also deleted on --clean (not only on --cleanAll).
  - Copying the ANTLR JAR from the download cache to the install dir is guarded by the install-success token (not by the download-success token)
- Fixed a typo.
Copy link
Collaborator

@pdamme pdamme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, it's ready to be merged from my point of view.

While testing it, I still encountered the following minor issues, which I quickly fixed myself:

  • The DAPHNE build directory was not deleted on --clean/--cleanAll. I added that.
  • Build after clean (./build.sh --clean && ./build.sh) failed. I fixed it by:
    • deleting the catch2 install-success token on --clean (otherwise catch2.hpp is missing in the second build)
    • guarding the copying of the ANTLR JAR from the download cache to the install directory by the install-success token (not the download-success token) (otherwise, the JAR is missing in the second build)

Furthermore, I slightly updated and corrected BuildingDaphne.md.

Feel free to squash my additions in any way you like :) .

I happily noticed that the updated build-workflow successfully completes on build/thirdparty directories created by old build-workflow. This means, no manual action is required after pulling these changes once they're on main. (Nevertheless, the thirdparty directory will contain outdated directories of the individual dependencies, which could be confusing, but that's not a real issue.)

@corepointer
Copy link
Collaborator Author

Awesome 👍 I'll bluntly go ahead and merge it in squashing your changes to a separate commit, to preserve some credit ;-) @pdamme

@pdamme
Copy link
Collaborator

pdamme commented May 26, 2022

Great that we finally got this merged in, it's a nice improvement. Thanks again @EricMier and @corepointer for pushing this forward!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants