-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hangs forever after a parallel job gets killed #3899
Comments
pylint hits 2 GiB per process pretty often [1], thus still causing OOM failures [2]. Bump the divisor. [1] pylint-dev/pylint#1495 [2] pylint-dev/pylint#3899
pylint hits 2 GiB per process pretty often [1], thus still causing OOM failures [2]. Bump the divisor. [1] pylint-dev/pylint#1495 [2] pylint-dev/pylint#3899 Cherry-picked from master PR rhinstaller#2923 Related: rhbz#1885635
max() was bogus -- we want *both* RAM and number of CPUs to limit the number of jobs, not take whichever is highest. [1] pylint-dev/pylint#1495 [2] pylint-dev/pylint#3899 Cherry-picked from master PR rhinstaller#2923 Related: rhbz#1885635
pylint hits 2 GiB per process pretty often [1], thus still causing OOM failures [2]. Bump the divisor. [1] pylint-dev/pylint#1495 [2] pylint-dev/pylint#3899 Cherry-picked from master PR rhinstaller#2923 Related: rhbz#1885635
pylint hits 2 GiB per process pretty often [1], thus still causing OOM failures [2]. Bump the divisor. [1] pylint-dev/pylint#1495 [2] pylint-dev/pylint#3899
@martinpitt thanks for this report. Probably the situation will be better if pylint-dev/astroid#837 get merged. |
@hippo91 : Indeed, fixing the memory leak is most appreciated, of course. I filed this separately though as handling crashed subthreads is not something that's related to the memory leak itself. Thanks! |
@martinpitt could you check if you still face this issue after #3869 has been merged into master? |
First, this is a quicker reproducer, especially after the huge memory issue got addressed (#1495). After the original instructions from the description, one can explicitly kill a thread, and see how it reacts:
And it indeed hangs at the end, so still the same situation as a month ago. I applied the patch:
(There's some fuzz and some rejections due to the ChangeLog and such, but these can be ignored). Re-running the test and the kill has no change, though: It still hangs forever at the end. So this is not fixed yet. |
The default timeout is 6 hours, which just causes unnecessary hangs, delays, and blocked self-hosted runners if anything goes wrong (such as pylint-dev/pylint#3899). A normal run takes about 10 minutes, so set a timeout of 30 minutes. Use one hour for the full container rebuild+validation+upload to be on the safe side (it usually takes between 15 and 18 minutes).
@martinpitt thanks for your feed pack. The issue remains open so. |
I'm also bumping into this issue when using |
If anyone are facing the same issue with
However #7834 might have fixed this |
This is similar to issue #1495 -- there is apparently some giant memory leak in pylint which makes it eat ginormous amounts of RAM. This issue is about mishandling parallel workers with
-j
-- once one worker dies, pylint runs into a deadlock.Steps to reproduce
podman run -it --rm --memory=2G fedora:33
This issues is not specific to podman -- you can use docker, or a VM, or anything where you can control the amount of RAM.
nproc
is 4, thus-j0
selects 4, but let's select-j4
explicitly:Current behavior
As per issue #1495, the 5 pylint processes pile up more and more RAM usage, until one gets killed because of OOM:
I still get some remaining output, then pylint hangs forever:
The other processes are still around:
The first three and the fifth wait in a futex:
and the fourth is apparently the controller, which waits in a read():
After pressing Control-C, I get
Expected behavior
pylint should notice that one worker died, and immediately abort, instead of hanging forever (which also blocks CI systems until their timeout, which is usually quite long).
pylint --version output
I tried
pip install pylint astroid --pre -U
but that doesn't install anything new.The text was updated successfully, but these errors were encountered: