-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All cmd.run invocations under Python 2 glacially slow on machines with high ‘ulimit -n’ #26255
Comments
@dneades, thanks for the report. |
@dneades The code in question may be I am also going to cc: the original author of that code to see if he has any ideas. @kev009 |
@cachedout Thank you for the pointer. I’m looking at the code now and I can immediately the reason why
to
That change alone shaves about half a second off the execution time. I’ll continue to have a look at this and see what else I can find. |
@cachedout Some further information. Having fixed So, It should be noted that, running directly from the command-line,
As are the
|
@cachedout Here’s some detailed logging. I’ve modified
Here is the output from
You can see from this that the I think that probably wraps-up all I can do on the |
(Heads-up: I have edited the previous three comments since I first made them.) |
One further piece of information. I have a FreeBSD VM, in addition to the physical host to which all of the above information pertains. I am not seeing the same extreme slowness on that VM:
Note however, that the
It’s curious that the |
Some more digging has isolated the problem to a I therefore added some print statements to the ctor of class
Running the
So, surprisingly, it’s actually the Finally, note that I have upgraded my machine to FreeBSD 10.2. The timings seem a little worse with this, but otherwise the problem is as previously characterized. |
I wrote
This reproduces the problem:
A little experimentation shows that the delay is entirely due to the presence of the
I’m not sure why the file descriptors are being closed by the various calls to Running the same test program on my FreeBSD 10.1 VM shows again that, although problem is not as severe, closing the file descriptors still takes a significant amount of time. First, here is a run on my VM with the
And here is a run with the
|
Back on my physical host, I wished to double-check my findings. I therefore added
Here is normal execution without the
|
A Google search for ‘subprocess close_fds slow’ turns up a number of interesting results… Based on some of the discussions listed in those search results about Python’s handling of
Just to make really sure that we are definitely controlling the problem, let’s set
Ouch! This nicely explains why my VM was performing so much better than the physical host. Here are the relevant
And on my physical host:
Conclusions:The primary actual problem here is that, because of the Since the slow Python (Incidentally, the issue labels probably need tweaking, since this turns out not actually to be a grains-specific problem.) Finally, there is also the more minor secondary issue of |
That's a nice analysis @dneades! Here's a PR to fix the kern.disks issue you pointed out: #26700 I don't have any insight into the general issue with close_fds. I haven't noticed yet because most our interactive salt use is in Vagrant VMs and the big boxes are schedule driven. |
Thank you for the PR @kev009, much appreciated. With respect to the |
I've got a very similar-sounding issue on 2015.5.5 on a masterless setup on Fedora, where grain collection is taking ages, and sure enough, most of that time is spent checking if my drives are SSDs. This is also a single state sls that's only mission in life is to install vim. The difference is, I don't have a high ulimit - it's 1024, on pretty weak hardware.
And the trace log:
|
Sorry for the delay here. This is a really interesting issue. After some thought, I think the recommendation to adjust ulimits on the minion is the right one, with a few caveats:
Sound good? |
That sounds good to me, @cachedout, thank you! |
@dneades Sounds good. Take a look at #30218 please. |
Thank you – I’ve provided a number of comments and test results on the PR! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue. |
I’m not sure that this issue was ever fixed, was it? If not, since it is rather a major problem, closing it would seem premature … |
Thank you for updating this issue. It is no longer marked as stale. |
To clarify, is the current workaround to lower the open file limit to 2048? |
FYI; I have had slowness issues on multiple FreeBSD 11.2 machines with salt 2019.2.0. I found some good workarounds in the linked issue #48773:
Adding disabled_grains & disabled_modules gave great performance boost in running pretty simple states, with a total execution time from ~15-30s down to ~3s for 2 local hosts. |
@stromnet that's got nothing to do with this issue. |
Is this issue still relevant, or can this be closed? The code related to ssds seems to be non-existent as its now based on checking the output of I realize FreeBSD is a kind of non-popular target, but I'm still a fan. So if this issue still exists, I can try to track it down... |
@arizvisa The primary problem is unrelated to SSDs, but is due to this Python 2 bug. See my earlier comment for a full analysis of the cause, and this comment for a suggested workaround. Here is the issue still occurring on a FreeBSD 12.0 machine with a high ulimit:
Note that the underlying Python bug is fixed in Python 3, so the problem does not occur with Salt when using that:
Finally, note that the |
Ah. Ok, my bad. I guess I totally skimmed through the comments between the version-reports. I realized why I'm not seeing it on my system is because I'm using a fork of Python2 with backports of some Python3 features. But since this is a problem in a core library, and judging by how it's been abandoned, this isn't something that salt can really fix, right? This'll go away at the end of the year as I "believe" that's when salt will drop support for Python2 :(. Until that happens though, there's https://github.com/google/python-subprocess32/. You might be able to drop it in place of Python2's subprocess as it has the Python3 patch that you mentioned integrated. Back when I scoped out merging Python3's subprocess, I remember resolving that the libraries are essentially backwards compatible. Maybe the title can be renamed to reference that it only occurs on Python2. They really need some Python3 tag for bugs like these for tracking. |
@arizvisa No problem; I appreciate the further comments. The workaround we deployed was to ensure that the FreeBSD user account used by As you say, this should fall out of the wash when Salt has fully migrated to Python 3. We’re not able to switch to Salt on Python 3 ourselves yet, as we are still being bitten by some of the outstanding bugs. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue. |
Thank you for updating this issue. It is no longer marked as stale. |
Upstream has now merged the correct fix! So this issue should no longer be a problem, if you use a recent build of Python on FreeBSD. https://svnweb.freebsd.org/ports?view=revision&revision=518640 applied the patch to FreeBSD's Python packages, and If you use FreeBSD's Python packages, the minimum versions containing the fix are 3.8.1_2, 3.7.5_1, 3.6.9_1, 3.5.9_3, or 2.7.17_1 . |
@waynew @sagetherage I think you should close this issue now, since it's been fixed upstream by Python. |
Note: This issue was originally titled ‘Grains collection glacially slow on FreeBSD 10.1 machine with multiple SSDs’. It has been renamed to better match the problem determined by the course of investigation outlined below.
A simple
salt-ssh
test.ping
call was taking over 16 seconds against a single physical FreeBSD host with 128 GB RAM and 4 SSDs. After some hunting, I was able to locate the problem as being in the collection of grains data.The issue is easily demonstrated locally on the host using
salt-call --local
:That took nearly 12 seconds! The same call, but skipping grains:
Grains collection is thus taking over 11 seconds on this host.
Running
salt-call
with logging set toall
shows a delay of several seconds before theDevice kern.disks: does not report itself as an SSD
line is output, and there is also a perceptible delay before each of the subsequent disk-related lines is output:Observations:
salt-ssh
extremely unpleasant to use against this host.kern.disks
is being queried as to whether it is an SSD, since it is not a disk device. This looks like a bug.ip
command (which does not exist on FreeBSD) would seem sub-optimal.Here is the versions report for the host, which has the
py27-salt
pkg installed:The four SSDs da0 - da3 are Intel 3500 series 800 GB drives connected via an LSI 2308 controller. da4 is a USB flash drive. Here’s the relevant
dmesg
output:The operating system is a fresh stock FreeBSD 10.1 install with all updates applied:
The text was updated successfully, but these errors were encountered: