Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling with MAKEOPTS -jN -lM? #53

Closed
APN-Pucky opened this issue Oct 22, 2024 · 7 comments
Closed

Scaling with MAKEOPTS -jN -lM? #53

APN-Pucky opened this issue Oct 22, 2024 · 7 comments

Comments

@APN-Pucky
Copy link

APN-Pucky commented Oct 22, 2024

Is it possible to read the MAKEOPTS from the log and scale the predictions by the number of cores? Sometimes I use -j1 -l1in the background and sometimes I use -j113 -l16 with distcc resulting in different run times per build? Is it possible to separate them? Same goes for builds that have a binpkg and are very quick then.

emlop a shows many >1000% due to different install modes on my system.

@vincentdephily
Copy link
Owner

I wish I could say yes, and I'm always looking for ways to improve the prediction, but there are a lot of hurdles:

  • /var/log/emerge.log (which emlop uses to know past compilation times) only records the package name and version. No USE, MAKEOPTS, or other things affecting compile time.
  • The speedup you get from parallel compiles is anything but linear, it's unclear how emlop would use that info if it was available.
  • A lot of build systems don't care about MAKEOPTS, and there are many other sources of build time differences (parallel emerge, system load, USE flags, new versions...).

The only thing I can think of in your case, is that you could configure a different emerge.log when you use distcc, and then tell emlop to use the best log for each case. If you try something like that, I've love to get your return on experience, and maybe some stats from emlop accuracy.

One thing that emlop could infer, is whether parallel merges (from a single emerge command, or multiple) are ongoing, by looking if events for different ebuilds are interleaved. Again, I'm not sure how usable that data would be for predicting build times, but gathering the data is the first step.

@APN-Pucky
Copy link
Author

APN-Pucky commented Nov 7, 2024

Thanks for the detailed answer, I have some ideas of trying to get that information, but very little time. emlop predict shows the current line of the output log and there sometimes is sth. like gcc -jN, maybe that could be used? I havent checked if binary packages look different in the logs somehow to infer that.

The speedup you get from parallel compiles is anything but linear, it's unclear how emlop would use that info if it was available.

Simplest would be to just use the closest match for predictions and track them separately per parallelization level, I guess most people have either a single thread background compile or a fast all core compile.

@kakra
Copy link

kakra commented Nov 7, 2024

I don't think that should be tried, it just won't scale correctly, never. You cannot expect a small package to be able to do 113 gcc processes in parallel, and even with big packages, it is unlikely to scale to that number due to inter-file dependencies. What should a scale factor be? Just measure a -j113 and multiply by 113 for a -j1 build? That's actually not how it works. Additionally, you're using -l16 which makes things even more difficult.

Also, some compiler processes do actually use multithreading, e.g. lto phases. This completely works against MAKEOPTS trying to run processes in parallel.

You should probably also look into something like EMERGE_DEFAULT_OPTS="--jobs=5 --load-average 8" so you can do parallel packages. Due to how many build systems work (long configure phase, low parallelism due to inter-file deps), there's great potential to save overall build time. I usually keep those numbers a little lower than my MAKEOPTS. It can overwhelm RAM usage if used wrongly.

You probably get better results by using different emerge.log files per MAKEOPTS configuration.

But I'd also prefer if emlop could tell binpkg merges and full merges apart. I'm using binpkg to cache packages so if a downgrade is needed, it simply re-uses the previous build. That leads to vastly shorter "build times". But luckily, emlop uses a median filter so this extreme "noise" is likely to be filtered away from predictions.

@APN-Pucky
Copy link
Author

scaling != linear scaling and if I understand it right it can be predicted per package, some can do -j100 others not. Nonetheless, I think tracking each -j1,-j2,...,-jN separately would be nice and probably more precise.
Sorry for my short answer.

@vincentdephily
Copy link
Owner

Getting the level of parallelism for an ongoing merge isn't too hard: emlop could look at CPU utilization of the emerge processes (let's ignore the distcc usecase for now). It's also easy enough to get lots of info (USE, CFLAGS...) about the currently-installed package by looking into /var/db/pkg/, and with some work we could find the same info about an ongoing merge.

The hard/impossible part is getting historical information, which is what predictions are based on. Was the python emerge from a month ago done with distcc ? With parallel emerges ? With binpkg ? With USE=pgo ? I don't know. I'd be happy to be proven wrong, can you find more useful historical info in your emerge.log or other places ?

Have a look at 5220495 and def6d42. I was hoping to figure out portage-level parallelism, but I've given on that for now due to a too high error rate. Again, I'd be happy to be proven wrong, feel free to pick up that branch and make it work.

Thank you both for this discussion. Even if it turns out we can't implement them, it's good to brainstorm ideas.

@vincentdephily
Copy link
Owner

I've merged #57, which doesn't fix this issue but is the fruit of my exploration. I don't think emlop will ever have access to the info required for this issue, so I'm closing it as wontfix.

@vincentdephily vincentdephily closed this as not planned Won't fix, can't repro, duplicate, stale Nov 26, 2024
@APN-Pucky
Copy link
Author

APN-Pucky commented Feb 4, 2025

The hard/impossible part is getting historical information, which is what predictions are based on. Was the python emerge from a month ago done with distcc ? With parallel emerges ? With binpkg ? With USE=pgo ? I don't know. I'd be happy to be proven wrong, can you find more useful historical info in your emerge.log or other places ?

@vincentdephily Apparently, the logs can not be extended (https://bugs.gentoo.org/949281), but one could add a new FEATURE to portage that stores the required information in a separate file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants