Documentation on skip_grains #53603

marbx · 2019-06-26T11:33:37Z

Introduction

Our master overloaded when we deployed a new minion configuration, in which minions regularly “ping” the salt-master and the salt-master reacts to the minion_ping event (with a runner).

The load exceeded 0.5, after approx. 10% of the Minions were pinging.
The salt-master receives a minion_ping every 2 seconds from different minions.

Therefore we stopped the deployment and profiled the salt-master.

Profiling showed that grains (green) create 33.61% load, called by MasterMinion (yellow), called by RunnerClient (orange).

We found this description of MasterMinion in salt/minion.py:

A MasterMinion is “a fully loaded minion function object for generic use on the master. What makes this class different is that the pillar is omitted, otherwise everything else is loaded cleanly.”

A RunnerClient creates a MasterMinion but also omits states and renderer
In salt/client/mixins.py:

self._mminion = salt.minion.MasterMinion(self.opts, states=False, rend=False)

Our RunnerClients don’t need grains. Therefore, we omitted them with the undocumented skip_grains setting.

We set skip_grains: True at 06-18 15:45 and observed an significant (16-fold) and necessary load drop:

load = 0.5 with skip_grains: False
load = 0.03 with skip_grains: True

We later also observed a significant improvement when restarting the master:

at 13:29 with skip_grains: False
at 13:56 with skip_grains: False
at 15:06. with skip_grains: True

Content

Created documentation on skip_grains

History

This PR first contained new config items to omit grains in the MasterMinion, along the lines of:

    if grains and self.opts.get('masterminion_grains', True): 
        self.opts['grains'] = salt.loader.grains(opts) 
    else: 
        self.opts['grains'] = {}

Adding new config items received constructive negative feedback, see below.
As we found the existing config item skip_grains , which serves our purpose when set to True on the master, we removed the new config items from this PR.

waynew · 2019-06-26T17:03:43Z

Nice! I'm not familiar with the implications here, but it certainly seems like a reasonable improvement! 👍

marbx · 2019-06-26T17:26:32Z

@dwoz, I would kindly ask for guidance how to write tests for this PR.

I think testing must show that all current tests of RunnerClient and MasterMinion, as long as they do not use grains, continue to pass when the proposed options are set to False.

I assume the setUp() procedure, can reconfigure a master.

My understanding is that runners are tested in https://github.com/saltstack/salt/blob/develop/tests/integration/client/test_runner.py

Can the setUp() routine get two boolean parameters, and be invoked 4 times?

setUp(runnerclient_grains=True, masterminion_grains=True)

The test of EventReturn must be accordingly modified, because it invokes MasterMinion, too.

…s an integration test

…on_omit_grains_on_master

mattp- · 2019-06-27T14:11:51Z

@markuskramerIgitt I'm not sure this change makes sense to add Yet Another Config Item (or two) for a special case. What you can do instead to effectively mitigate this change is something like:

enable grains_cache: True / grains_cache_ttl: N seconds
set up a grains_refresh that is <= the ttl, this will put the blocking grains refresh into the schedulers path
this should essentially give you the same performance gain. tbh I think grains in general should be configured by default to work this way, so that noone ever gets hit with this blocking performance loss. What are your thoughts about that approach?

max-arnold · 2019-06-27T15:18:42Z

Good point!

What bothers me most is that nobody knows what else will break when the grains are disabled. Existing unit tests can't uncover potential problems because they do not use these options. Someone WILL attempt to run other code paths in this mode, and it will lead to more stale issues, and more complains that Salt is complex and buggy.

This performance problem is important, but the resulting complexity and combinatorial explosion of possible code paths are worse and much harder to fix.

It is not the first time when a grain-related optimization hack is submitted: #49481

UPD: recently I spent hours debugging the grains caching issue #53536

marbx · 2019-06-28T08:28:47Z

Hi @mattp, I have to admit, I too dislike the addition of these 2 new config items.

You introduced LazyLoading for functions in PR #53553, which I noticed a few days ago, only realizing now, you are the author.

If grains could be lazy loaded, it could deliver the performance improvements we need, while those who use grains should not notice a difference, without the addition of config items.

Regarding grains_cache:

In our setup, the only grain we need is the id of the minion, which comes with the event. So essentially we don’t use grains. Therefore, I would not profit from the content of a grains_cache, while it still must be loaded once each grains_cache_ttl. We have 20K minions. For performance (CPU and network), we would tend to increase the TTL, which would reduce the validity of some of the grains, which we don’t need in the first place. We also have very mixed results from trying to use schedulers, so I would wait for a use case before I try these techniques (again).
Combining grains_cache and “lazy loaded” grains would result in “lazy loaded cached” grains.

Therefore I would like to continue with “lazy loaded grains”.
What do you think?

marbx · 2019-06-28T09:25:53Z

Hi @max-arnold, thank you for pointing to the existing grains performance discussions!
What a long story…

I completely agree with you that config items complicate testing (“combinatorial explosion of possible code”). I now realize my proposed new config items are very naïve.

Please allow me to replace them with a better solution - what do you think of “lazy loaded” grains?

max-arnold · 2019-06-28T12:57:12Z

@markuskramerIgitt

Please allow me to replace them with a better solution - what do you think of “lazy loaded” grains?

I do not know the codebase well enough, but at first glance it looks like a good idea that could be explored. Do you want to lazy-load grains on a minion? How this could affect accessing grains on a master (for example in a pillar file)?

Another option is to achieve your goal in a different way:

What you want to do by storing minion ping events in a Redis DB?
What the word "runner" means for you? I guess it is not a runner module?
Could minions access your Redis instance directly? Or use a returner?
Could a custom engine be used instead? (example1, example2)
Why it is necessary to instantiate new MasterMinion on each event?

And (out of curiosity), how did you make this nice profiling graph on a master?

mattp- · 2019-06-28T15:57:25Z

@markuskramerIgitt thats an interesting an idea! but let me revisit your response to the grain cache first:
this PR Is only as it relates to runner instantiating grains and getting hit, not minion side. so you would only need to turn on the cache on your masters, you could leave it off on the minion fleet. that would short circuit the grains load on any runnerclient to the equiv of loading the serialized msgpack data.
wrt lazyloading the grains, thats probably not what id call it. its a bit confusing, but salt.loader.grains() returns a plain dict, with salt.loader.grains_funcs() returning the actual lazyloader object (equivalent to minion_mods, or any other, etc). Due to the behavior of current grains, its not a 1 to 1 of root key in grains to a given grains module/function, and they can actually overlap such that a grains function can return any data to any key it wants.
What would make sense is deferring this call, such that the dict returned by salt.loader.grains is actually a wrapper, when __getitem__ is hit it will then call salt.loader.grains(). So similar to LazyLoader, but not quite 🙂In the default case like above, i believe a runner would actually never trigger it unless you have code that reaches into __grains__ to do a thing.

mattp- · 2019-06-28T17:17:12Z

@markuskramerIgitt also echoing @max-arnold im curious what you used to create that profiling output, seems very useful/nice on the eyes 🙂

marbx · 2019-06-28T17:30:03Z

Hi @max-arnold,

Do you want to lazy-load grains on a minion?

My focus is to enhance performance of the master, but I guess lazy-loading (if feasible) should not differentiate between master and minion.

How this could affect accessing grains on a master (for example in a pillar file)?

My understanding is, that “lazy loading“ grains would happen “as late as possible”, in confront with “as soon as possible” as of now, but otherwise it must be completely transparent. I am not familiar with pillars, but the entity loading grains should also not matter.
We should create test cases in this PR, to have proof.

What you want to do by storing minion ping events in a Redis DB?

If the runner that reacts on the minion_ping finds no Redis flag all_task_done:MINON, it calls the local module all_tasks.main on that minion. Another runner, that reacts on all_tasks.main, writes the Redis flag all_task_done:MINON.

What the word "runner" means for you? I guess it is not a runner module?

I actually mean Python modules of this type (runners).

Could minions access your Redis instance directly?

I must restrict access to Redis (or maybe memcache, which seems to be installed with Salt) to the master.

Or use a returner? Could a custom engine be used instead?

Thank you, I need to assess these concepts and will come back to you.

Why it is necessary to instantiate new MasterMinion on each event?

I would not know. I don’t intentionally call them, the RunnerClient does. I only observe them in the profile output.

how did you make this nice profiling graph on a master?

Thank you, but I made them on Windows. See How to create master profiling graphs on Windows I guess the same tools could run on a master (I only have tty access to the master)

marbx · 2019-07-02T07:05:28Z

Hi @mattp

[...] you would only need to turn on the cache on your masters, you could leave it off on the minion fleet. that would short circuit the grains load on any runnerclient […]

I will try that later. The short circuit is only effective on a repetition within grains_cache_ttl seconds. I expect the master still to overload at restart, and because we have several thousand minions, only a small benefit.

[…] a wrapper, when __getitem__ is hit it will then call salt.loader.grains().

I start to understand that “deferring” and “lazy loading” are separate effects.
I would benefit of a “deferred” call. How do you implement that?

I fail to access the grains that the MasterMinion in the RunnerClient load.
How do you access them? I would like to write a test case.

marbx · 2020-02-08T16:26:39Z

@waynew, I removed the test cases and turned this PR into pure documentation - could you please review?

waynew · 2020-02-27T21:17:05Z

re-run all

DmitryKuzmenko · 2020-04-06T08:53:25Z

@markuskramerIgitt Could you please rebase it to master branch or re-create the PR targeted on master?

marbx · 2020-04-06T10:46:13Z

Hi @DmitryKuzmenko
I re-created the PR targeted on master in #56098

DmitryKuzmenko · 2020-04-06T22:29:54Z

@markuskramerIgitt thank you!

Option to disable Runners or MasterMinions grains to enhance performance

a11f811

marbx requested a review from a team as a code owner June 26, 2019 11:33

Ch3LL requested review from DmitryKuzmenko and garethgreenaway June 26, 2019 13:51

Ch3LL added the Reviewers-Assigned label Jun 26, 2019

Improved wording

2b734af

marbx and others added 11 commits June 27, 2019 12:05

Merge branch 'develop' into option_omit_grains_on_master

f259b15

Added the test runner test

e2164f6

Linting

ca4330f

Linting

eab3457

Improved test_get_opts(self)

e2cef6f

Linting

cb62189

Fixing test_arg(self)

d3d9eb6

Removing test_get_opts(self) because it uses __opts__ and therefore i…

e433b9e

…s an integration test

Added test_get_opts(self) as an integration test

51c1743

Merge branch 'develop' of https://github.com/saltstack/salt into opti…

45371fc

…on_omit_grains_on_master

Bugfix: test.get_opts returns a list, not a dictionary

0a8b191

max-arnold mentioned this pull request Jun 27, 2019

SEP 10 - PR Merge Requirements saltstack/salt-enhancement-proposals#13

Merged

Merge branch 'develop' into option_omit_grains_on_master

03cb89a

Revert all code and comment files, leave only tests

b15bad7

marbx changed the title ~~Option to disable Runners or MasterMinions grains to enhance performance~~ Adding more test cases for runners and discussion on grains Jun 28, 2019

Merge branch 'develop' into option_omit_grains_on_master

a1b76b7

waynew added the Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged label Jul 24, 2019

marbx closed this Nov 15, 2019

marbx deleted the option_omit_grains_on_master branch November 15, 2019 13:22

marbx restored the option_omit_grains_on_master branch November 15, 2019 13:39

marbx reopened this Nov 15, 2019

marbx and others added 2 commits February 8, 2020 16:48

doc, no tests

f92a815

Merge branch 'develop' into option_omit_grains_on_master

909c25c

marbx changed the title ~~Adding more test cases for runners and discussion on grains~~ Documentation on skip_grains Feb 8, 2020

marbx mentioned this pull request Feb 8, 2020

doc skip_grains PR #53603 #56098

Merged

waynew approved these changes Feb 27, 2020

View reviewed changes

DmitryKuzmenko approved these changes Apr 6, 2020

View reviewed changes

DmitryKuzmenko closed this Apr 6, 2020

dwoz pushed a commit that referenced this pull request Apr 11, 2020

doc skip_grains PR #53603

9387f09

marbx deleted the option_omit_grains_on_master branch April 19, 2020 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation on skip_grains #53603

Documentation on skip_grains #53603

marbx commented Jun 26, 2019 •

edited

Loading

waynew commented Jun 26, 2019

marbx commented Jun 26, 2019 •

edited

Loading

mattp- commented Jun 27, 2019

max-arnold commented Jun 27, 2019 •

edited

Loading

marbx commented Jun 28, 2019

marbx commented Jun 28, 2019

max-arnold commented Jun 28, 2019 •

edited

Loading

mattp- commented Jun 28, 2019 •

edited

Loading

mattp- commented Jun 28, 2019

marbx commented Jun 28, 2019

marbx commented Jul 2, 2019

marbx commented Feb 8, 2020

waynew commented Feb 27, 2020

DmitryKuzmenko commented Apr 6, 2020

marbx commented Apr 6, 2020

DmitryKuzmenko commented Apr 6, 2020

Documentation on skip_grains #53603

Documentation on skip_grains #53603

Conversation

marbx commented Jun 26, 2019 • edited Loading

Introduction

Content

History

waynew commented Jun 26, 2019

marbx commented Jun 26, 2019 • edited Loading

mattp- commented Jun 27, 2019

max-arnold commented Jun 27, 2019 • edited Loading

marbx commented Jun 28, 2019

marbx commented Jun 28, 2019

max-arnold commented Jun 28, 2019 • edited Loading

mattp- commented Jun 28, 2019 • edited Loading

mattp- commented Jun 28, 2019

marbx commented Jun 28, 2019

marbx commented Jul 2, 2019

marbx commented Feb 8, 2020

waynew commented Feb 27, 2020

DmitryKuzmenko commented Apr 6, 2020

marbx commented Apr 6, 2020

DmitryKuzmenko commented Apr 6, 2020

marbx commented Jun 26, 2019 •

edited

Loading

marbx commented Jun 26, 2019 •

edited

Loading

max-arnold commented Jun 27, 2019 •

edited

Loading

max-arnold commented Jun 28, 2019 •

edited

Loading

mattp- commented Jun 28, 2019 •

edited

Loading