Skip to content

Add note about makeflags and ninja parallelism #39128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 21, 2025

Conversation

user202729
Copy link
Contributor

@user202729 user202729 commented Dec 14, 2024

MAKEFLAGS is an alternative to MAKE.

The note about ninja can stay there until #38950 is fixed to warn the user e.g. to not run too many other things in parallel. It would be rather surprising if you pass make -j3 there and have 7 processes running in parallel (which might eat up a lot of RAM and possibly have the OOM killer kill other processes along the way).

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have updated the documentation and checked the documentation preview.

⌛ Dependencies

Copy link

github-actions bot commented Dec 14, 2024

Documentation preview for this PR (built with commit 2af2e49; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

README.md Outdated
Comment on lines 344 to 346
Alternatively, the `MAKEFLAGS` environment variable can be used.
In this case, only provide the flag itself, for example
`export MAKEFLAGS="-j4"`.
Copy link
Member

@saraedum saraedum Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should recommend only one thing here.

(SageMath's build system is not great to say it politely. I personally don't think we should invest much to try to fix that but rather move towards removing it eventually. end rant.)

The problem with MAKE=make -jX is that leads to exponential growth of processes (and actually seems to still cause trouble on macOS cc @fchapoton)

The problem with make -jX is that this parameter is not passed on to submakes if they are invoked by a shell script, e.g., when we invoke a Makefile in the build of a subpackage. (I don't think we have any magic in place for this when we have the invocation chain Makefile → shell script → Makefile.) Then again, that can be beneficial, if the top-level make decides to build 16 SPKGs in parallel. If each of them were to use 16 parallel builds, then you might end up getting yourself into trouble.

I think the sane option here is to set MAKEFLAGS="-j$(nproc) -l$(nproc)" on Linux and MAKEFLAGS="-j$(sysctl -n hw.ncpu) -l$(sysctl -n hw.ncpu)" on macOS. This stops excessive amounts of subprocesses but still gives you all the parallelity possible.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually submake is a solved problem, make can limit total number of processes even in sub-make https://www.gnu.org/software/make/manual/html_node/Job-Slots.html .

The problem ensues when you mix make with ninja which is what this pull request notes ninja-build/ninja#1139

The only case where number of processes may explode as far as I know is #38753 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(that said, I already moved to meson/ninja, which has much better incremental compilation time. The observation above is back when I was still compiling with make)

I'm not entirely sure what's the difference between MAKE="make -j4" and MAKEFLAGS="-j4", but I think most sources I can find recommend the latter e.g. https://stackoverflow.com/a/4778419/ . I personally think MAKEFLAGS is cleaner than MAKE. Of course MAKE="-j4" does not work.

I didn't know about -l N , but it might solve the problem of make/ninja non-cooperation… probably by making make yield to ninja entirely. (or maybe it will occasionally happen that number of parallel jobs exceed the limit anyway, I haven't really tested.)

I don't think we need to tell the user how to figure out how many CPU their computer has, and it's easy enough to look up this information anyway (it's more complicated than just nproc, for example my computer has 4 logical CPU but only 2 physical CPU, so for CPU-intensive tasks effectively you can only run 2 in parallel, more will not speed it up). But having the extra information for convenience can't hurt either, up to you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually submake is a solved problem, make can limit total number of processes even in sub-make https://www.gnu.org/software/make/manual/html_node/Job-Slots.html .

Ok, nowadays, make puts the jobserver into MAKEFLAGS so that works correctly even when in make → shell → make situations. That's great :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to tell the user how to figure out how many CPU their computer has

In principle I agree. However, having probably held hands with a hundred or so people installing SageMath from source, much of our user base tends to be such that they don't actually know. I believe that the recommended commands are essentially always available and make people's lives easier. I don't insist on them, but I think the readers of the README would appreciate that information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is… sad.

I suppose ideally only a developers should need to compile from source, users should be able to just download conveniently-packed executable. But sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what's the difference between MAKE="make -j4" and MAKEFLAGS="-j4"

I just did a truly unscientific experiment with

A: B C

B:
	ps aux | grep make | wc -l
	@sleep 1
	@${MAKE}

C:
	@sleep 1
	@${MAKE}

When invoked with make -j4 or MAKEFLAGS=-j4 make, the number of make processes grows linearly, so there's only ever 4 "running" (i.e., in the sleep.) With MAKE="make -j4" make this grows much faster. So that's calling for trouble.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose ideally only a developers should need to compile from source

Yes, the state of SageMath installers and such is a different story (though there's some recent progress here, can provide links if you care.) Note that many of the "developers" here are full time mathematicians that are not savvy when it comes to development. But most of them would probably know how to google the number of cores their machine has ;) Then again, if we can lower the bar, I would usually go for that. I've seen too many people give up along the way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird. In fact if you also ps aux|grep sleep then you see it exceeds 4 by a large factor.

Okay yes, MAKEFLAGS is better than MAKE.

README.md Outdated
Comment on lines 348 to 356
Note that the compilation may nonetheless uses a different number of
threads, because sometimes `ninja` is used.
Unfortunately, [there is no way to control number of jobs `ninja` uses
from environment variables](https://github.com/ninja-build/ninja/issues/1482).
See also https://github.com/sagemath/sage/issues/38950.

If the [Meson build system](https://doc-release--sagemath.netlify.app/html/en/installation/meson)
is used, the number of jobs running in parallel passed to `meson compile` will be respected,
because everything are managed by `ninja`.
Copy link
Member

@saraedum saraedum Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only place that uses ninja is meson. Ninja does a good job at figuring out how parallel to build for most users. I am not sure it's necessary to comment on the details here. Maybe it's enough to say that the above only affects some parts of the build system and that it is not universally applied everywhere yet?

Maybe we could recommend to set SAGE_NUM_THREADS as well? I think it's used by things such as the docbuild but I have not double checked that this is (still) the case.

Copy link
Contributor Author

@user202729 user202729 Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding SAGE_NUM_THREADS, I think there's some magic that automatically figures that out by parsing MAKEFLAGS. (sage-num-threads.py script) So should be unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some magic

Hehe, ok. That could well be. It's all a bit confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sage-num-threads.py

Confusingly, this does not parse MAKEFLAGS (though it claims to) but sage-build-num-threads.py does.

Many corners of the build system are in a very unfortunate state. I hope to sit down for a month at some point, push a pixi with meson setup and reduce things to trivial instructions that should make 99% of users and developers and packagers happy. But that's certainly way beyond the scope of this PR :)

@saraedum
Copy link
Member

It would be rather surprising if you pass make -j3 there and have 7 processes running in parallel (which might eat up a lot of RAM and possibly have the OOM killer kill other processes along the way).

Does the makefile invoke more than one ninja process?

@saraedum
Copy link
Member

@user202729 I'm on sagemath.zulipchat.com if you want to discuss this directly. I'm Julian Rüth over there.

@user202729
Copy link
Contributor Author

user202729 commented Feb 13, 2025

Does the makefile invoke more than one ninja process?

I no longer use make, but I can look at the process tree in #38950 (focus on first column which is %CPU usage)

in summary the process tree looks something like

make 
  make scipy
    ninja
      g++
      g++
      g++
      g++
  make sagelib
    g++
    g++
    g++

tl;dr make invokes one ninja and three g++ (correctly handling submake), but the ninja invokes four g++.

I don't know if by some unfortunate chance/timing the make can invoke two ninja, but in principle there doesn't seem to be anything that rules out that chance (if the top make invokes two sub-make to build two spkg, and these two spkg happens to both use ninja). (I don't think I want to figure out anyway.)

@saraedum
Copy link
Member

Does the makefile invoke more than one ninja process?

I no longer use make, but I can look at the process tree in #38950 (focus on first column which is %CPU usage)
[...]

Thanks for the analysis. I think that we cannot achieve to fully solve this problem here. As long as ninja (or some other build system) does not communicate with make's jobserver. I think we want the build to run reasonably fast for most people without totally overcommitting (and then having things crash because of OOM situations and such.) I think most users will survive if we overcommit once or twice so make -j$nproc should be safe. What do you think? (Yes, in some situations you don't get a performance benefit as opposed to using half as many, but it also doesn't hurt typically.)

@saraedum
Copy link
Member

I'm happy to propose a modified text once we agree on what should be written in principle. (But also happy to look at your proposal of course.)

@user202729
Copy link
Contributor Author

If you can write the documentation that's good too.

Main point for me is just to tell the user basically, if you specify MAKEFLAGS=-jX, then there may be greater than X processes because of these reasons. (In the example above, if there's X-1 g++ processes and 1 ninja processes, net result is X+nproc-1.) The rest isn't as important.

@saraedum
Copy link
Member

Main point for me is just to tell the user basically, if you specify MAKEFLAGS=-jX, then there may be greater than X processes because of these reasons. (In the example above, if there's X-1 g++ processes and 1 ninja processes, net result is X+nproc-1.) The rest isn't as important.

True that's certainly not what you would expect so it makes sense to point that out explicitly.

I'll push a revised version to this branch. Feel free to throw it away if you don't like it. I'll send you a ping when it's ready.

@saraedum
Copy link
Member

Unrelated to this pull request (and not sure how to message you otherwise.) If you'd like to have triage rights so you can change labels on pull request, please let me know. Similarly, if you'd like to join SageDays at some point (we're currently at SageDays close to Bordeaux, in two weeks in Vienna, in August in Japan) feel free to reach out (my email is in my commits.)

Setting MAKE=make -jX can lead to an exponential growth of build
processes (as noted during SD128.) It's better to rely on make's
jobserver that is available on macOS and all relevant Linux
distributions.

We also recommend a sane default for MAKEFLAGS that people can just copy
& paste. Unfortunately, a lot of non-developers of SageMath still have
to build from source and they appreciate not having to think about these
things and just being able to copy a good default value.
@saraedum
Copy link
Member

@user202729 what do you think about the current version? It's not perfect but I think it's an improvement over what was there before.

@user202729
Copy link
Contributor Author

user202729 commented Feb 13, 2025

Looks less technical than my original version, but since many end users see README that's also good for me.

(About triage rights, I think that's also good for me.)

@saraedum
Copy link
Member

Done. Feel free to put the positive review label on if you are happy with the changes here.

vbraun pushed a commit to vbraun/sage that referenced this pull request Feb 18, 2025
sagemathgh-39128: Add note about makeflags and ninja parallelism
    
`MAKEFLAGS` is an alternative to `MAKE`.

The note about `ninja` can stay there until
sagemath#38950 is fixed to warn the user
e.g. to not run too many other things in parallel. It would be rather
surprising if you pass `make -j3` there and have 7 processes running in
parallel (which might eat up a lot of RAM and possibly have the OOM
killer kill other processes along the way).


### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [x] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#39128
Reported by: user202729
Reviewer(s): Julian Rüth, user202729
@vbraun vbraun merged commit d0e72a6 into sagemath:develop Feb 21, 2025
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants