-
-
Notifications
You must be signed in to change notification settings - Fork 599
Add note about makeflags and ninja parallelism #39128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Documentation preview for this PR (built with commit 2af2e49; changes) is ready! 🎉 |
README.md
Outdated
Alternatively, the `MAKEFLAGS` environment variable can be used. | ||
In this case, only provide the flag itself, for example | ||
`export MAKEFLAGS="-j4"`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should recommend only one thing here.
(SageMath's build system is not great to say it politely. I personally don't think we should invest much to try to fix that but rather move towards removing it eventually. end rant.)
The problem with MAKE=make -jX
is that leads to exponential growth of processes (and actually seems to still cause trouble on macOS cc @fchapoton)
The problem with make -jX
is that this parameter is not passed on to submakes if they are invoked by a shell script, e.g., when we invoke a Makefile in the build of a subpackage. (I don't think we have any magic in place for this when we have the invocation chain Makefile → shell script → Makefile.) Then again, that can be beneficial, if the top-level make decides to build 16 SPKGs in parallel. If each of them were to use 16 parallel builds, then you might end up getting yourself into trouble.
I think the sane option here is to set MAKEFLAGS="-j$(nproc) -l$(nproc)"
on Linux and MAKEFLAGS="-j$(sysctl -n hw.ncpu) -l$(sysctl -n hw.ncpu)"
on macOS. This stops excessive amounts of subprocesses but still gives you all the parallelity possible.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually submake is a solved problem, make
can limit total number of processes even in sub-make https://www.gnu.org/software/make/manual/html_node/Job-Slots.html .
The problem ensues when you mix make with ninja which is what this pull request notes ninja-build/ninja#1139
The only case where number of processes may explode as far as I know is #38753 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(that said, I already moved to meson/ninja, which has much better incremental compilation time. The observation above is back when I was still compiling with make)
I'm not entirely sure what's the difference between MAKE="make -j4"
and MAKEFLAGS="-j4"
, but I think most sources I can find recommend the latter e.g. https://stackoverflow.com/a/4778419/ . I personally think MAKEFLAGS
is cleaner than MAKE
. Of course MAKE="-j4"
does not work.
I didn't know about -l N
, but it might solve the problem of make/ninja non-cooperation… probably by making make
yield to ninja
entirely. (or maybe it will occasionally happen that number of parallel jobs exceed the limit anyway, I haven't really tested.)
I don't think we need to tell the user how to figure out how many CPU their computer has, and it's easy enough to look up this information anyway (it's more complicated than just nproc
, for example my computer has 4 logical CPU but only 2 physical CPU, so for CPU-intensive tasks effectively you can only run 2 in parallel, more will not speed it up). But having the extra information for convenience can't hurt either, up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually submake is a solved problem, make can limit total number of processes even in sub-make https://www.gnu.org/software/make/manual/html_node/Job-Slots.html .
Ok, nowadays, make puts the jobserver into MAKEFLAGS so that works correctly even when in make → shell → make situations. That's great :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to tell the user how to figure out how many CPU their computer has
In principle I agree. However, having probably held hands with a hundred or so people installing SageMath from source, much of our user base tends to be such that they don't actually know. I believe that the recommended commands are essentially always available and make people's lives easier. I don't insist on them, but I think the readers of the README would appreciate that information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is… sad.
I suppose ideally only a developers should need to compile from source, users should be able to just download conveniently-packed executable. But sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure what's the difference between MAKE="make -j4" and MAKEFLAGS="-j4"
I just did a truly unscientific experiment with
A: B C
B:
ps aux | grep make | wc -l
@sleep 1
@${MAKE}
C:
@sleep 1
@${MAKE}
When invoked with make -j4
or MAKEFLAGS=-j4 make
, the number of make processes grows linearly, so there's only ever 4 "running" (i.e., in the sleep.) With MAKE="make -j4" make
this grows much faster. So that's calling for trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose ideally only a developers should need to compile from source
Yes, the state of SageMath installers and such is a different story (though there's some recent progress here, can provide links if you care.) Note that many of the "developers" here are full time mathematicians that are not savvy when it comes to development. But most of them would probably know how to google the number of cores their machine has ;) Then again, if we can lower the bar, I would usually go for that. I've seen too many people give up along the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird. In fact if you also ps aux|grep sleep
then you see it exceeds 4 by a large factor.
Okay yes, MAKEFLAGS is better than MAKE.
README.md
Outdated
Note that the compilation may nonetheless uses a different number of | ||
threads, because sometimes `ninja` is used. | ||
Unfortunately, [there is no way to control number of jobs `ninja` uses | ||
from environment variables](https://github.com/ninja-build/ninja/issues/1482). | ||
See also https://github.com/sagemath/sage/issues/38950. | ||
|
||
If the [Meson build system](https://doc-release--sagemath.netlify.app/html/en/installation/meson) | ||
is used, the number of jobs running in parallel passed to `meson compile` will be respected, | ||
because everything are managed by `ninja`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only place that uses ninja is meson. Ninja does a good job at figuring out how parallel to build for most users. I am not sure it's necessary to comment on the details here. Maybe it's enough to say that the above only affects some parts of the build system and that it is not universally applied everywhere yet?
Maybe we could recommend to set SAGE_NUM_THREADS
as well? I think it's used by things such as the docbuild but I have not double checked that this is (still) the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding SAGE_NUM_THREADS
, I think there's some magic that automatically figures that out by parsing MAKEFLAGS
. (sage-num-threads.py
script) So should be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's some magic
Hehe, ok. That could well be. It's all a bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sage-num-threads.py
Confusingly, this does not parse MAKEFLAGS
(though it claims to) but sage-build-num-threads.py
does.
Many corners of the build system are in a very unfortunate state. I hope to sit down for a month at some point, push a pixi with meson setup and reduce things to trivial instructions that should make 99% of users and developers and packagers happy. But that's certainly way beyond the scope of this PR :)
Does the makefile invoke more than one ninja process? |
@user202729 I'm on sagemath.zulipchat.com if you want to discuss this directly. I'm Julian Rüth over there. |
I no longer use in summary the process tree looks something like
tl;dr I don't know if by some unfortunate chance/timing the |
Thanks for the analysis. I think that we cannot achieve to fully solve this problem here. As long as ninja (or some other build system) does not communicate with make's jobserver. I think we want the build to run reasonably fast for most people without totally overcommitting (and then having things crash because of OOM situations and such.) I think most users will survive if we overcommit once or twice so |
I'm happy to propose a modified text once we agree on what should be written in principle. (But also happy to look at your proposal of course.) |
If you can write the documentation that's good too. Main point for me is just to tell the user basically, if you specify |
True that's certainly not what you would expect so it makes sense to point that out explicitly. I'll push a revised version to this branch. Feel free to throw it away if you don't like it. I'll send you a ping when it's ready. |
Unrelated to this pull request (and not sure how to message you otherwise.) If you'd like to have triage rights so you can change labels on pull request, please let me know. Similarly, if you'd like to join SageDays at some point (we're currently at SageDays close to Bordeaux, in two weeks in Vienna, in August in Japan) feel free to reach out (my email is in my commits.) |
Setting MAKE=make -jX can lead to an exponential growth of build processes (as noted during SD128.) It's better to rely on make's jobserver that is available on macOS and all relevant Linux distributions. We also recommend a sane default for MAKEFLAGS that people can just copy & paste. Unfortunately, a lot of non-developers of SageMath still have to build from source and they appreciate not having to think about these things and just being able to copy a good default value.
846bb4d
to
d94fd89
Compare
@user202729 what do you think about the current version? It's not perfect but I think it's an improvement over what was there before. |
Looks less technical than my original version, but since many end users see README that's also good for me. (About triage rights, I think that's also good for me.) |
Done. Feel free to put the positive review label on if you are happy with the changes here. |
sagemathgh-39128: Add note about makeflags and ninja parallelism `MAKEFLAGS` is an alternative to `MAKE`. The note about `ninja` can stay there until sagemath#38950 is fixed to warn the user e.g. to not run too many other things in parallel. It would be rather surprising if you pass `make -j3` there and have 7 processes running in parallel (which might eat up a lot of RAM and possibly have the OOM killer kill other processes along the way). ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [x] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#39128 Reported by: user202729 Reviewer(s): Julian Rüth, user202729
MAKEFLAGS
is an alternative toMAKE
.The note about
ninja
can stay there until #38950 is fixed to warn the user e.g. to not run too many other things in parallel. It would be rather surprising if you passmake -j3
there and have 7 processes running in parallel (which might eat up a lot of RAM and possibly have the OOM killer kill other processes along the way).📝 Checklist
⌛ Dependencies