Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 3006.5 "grains" extmods folder cleaned in unexpected situations #65692

Closed
4 of 9 tasks
lomeroe opened this issue Dec 13, 2023 · 30 comments
Closed
4 of 9 tasks

[BUG] 3006.5 "grains" extmods folder cleaned in unexpected situations #65692

lomeroe opened this issue Dec 13, 2023 · 30 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE

Comments

@lomeroe
Copy link
Contributor

lomeroe commented Dec 13, 2023

Description
In 3006.5, the extmods grains folder is removed in many situations:

when the --local option is passed to a salt-call command and the minion has previously sync'd grains from a master, losing all previously synchronized grain modules

The folder also seems to be removed if the minion cannot connect to the master. I.E. a mastered minion, whose master is down, will clean its grains folder and lose all sync'd grains until the master is "alive" again and they can be resync'd.

The folder is removed on other commands, even when the master is alive, for example, salt-call test.ping will remove the grains extmods folder and the minion will lose sync'd grains.

Setup
(Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)

Please be as specific as possible and give set-up details.

  • on-prem machine
  • VM (Virtualbox, KVM, etc. please specify)
  • VM running on a cloud service, please be explicit and add details
  • container (Kubernetes, Docker, containerd, etc. please specify)
  • or a combination, please be explicit
  • jails if it is FreeBSD
  • classic packaging
  • onedir packaging
  • used bootstrap to install

Steps to Reproduce the behavior

Situation 1:
"--local" option

configure a minion with a master that provides additional grain modules via the _grains folder
sync the grains from the master salt-call saltutil.sync_grains or salt-call saltutil.sync_all, /var/cache/salt/minion/grains should have the grains modules from the _grains folder on the master
use salt-call --local <anything> and the /var/cache/salt/minion/grains folder will be removed - other extmods folders (like modules, states, etc do not seem to be affected).

Situation 2:
module commands:

salt-call saltutil.sync_all extmods grains folder exists
salt-call test.ping grains extmods folder is removed (other modules tested where extmods grains folder is removed: grains.get, grains.items

Situation 3:
master is down

salt-call saltutil.sync_all - extmods grains folder exists
..take master offline..
salt-call grains.items - extmods grains folder is removed

Expected behavior

The grains extmods folder is not cleaned in any of the specified scenarios.

Screenshots
If applicable, add screenshots to help explain your problem.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3006.5

Python Version:
        Python: 3.10.13 (main, Nov 15 2023, 04:34:27) [GCC 11.2.0]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: 18.6.1
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.2
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.9.8
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.14.2
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: rhel 9.3 Plow
        locale: utf-8
       machine: x86_64
       release: 5.14.0-362.13.1.el9_3.x86_64
        system: Linux
       version: Red Hat Enterprise Linux 9.3 Plow

Additional context
Based on 3006.5 release notes alone, I would suspect this is a side effect of #65186

A minion that is explicitly configured as "local" does not seem to exhibit any of these symptoms.

@lomeroe lomeroe added Bug broken, incorrect, or confusing behavior needs-triage labels Dec 13, 2023
@lkubb
Copy link
Contributor

lkubb commented Dec 13, 2023

There's a test failure that can be traced back to the mentioned PR as well: #65574

I haven't had the motivation to grok the tested module or the test yet, but it might be related to this as well.

@dmurphy18 dmurphy18 self-assigned this Dec 14, 2023
@dmurphy18
Copy link
Contributor

dmurphy18 commented Dec 15, 2023

@lomeroe Can you include an example of the custom grain, want to duplicate exactly the issue you are seeing, that is, the custom grain is part of the steps to reproduce it.

@dmurphy18 dmurphy18 added this to the Sulfur v3006.6 milestone Dec 15, 2023
@lomeroe
Copy link
Contributor Author

lomeroe commented Dec 15, 2023

The EC2 info grain is one of many we distribute, but publicly available, all the others are pretty custom/specific to our environment...

https://github.com/vmware-archive/salt-contrib/blob/master/grains/ec2_info.py

@dmurphy18
Copy link
Contributor

Looking for a simple contents of /srv/salt/_grains which shows the issue

@dmurphy18
Copy link
Contributor

btw: with /srv/salt/_grains/test_grains.py, the directory is not /var/cache/salt/minion/grains but

root@david-VirtualBox:/var/cache/salt/minion/extmods/grains# l
total 16K
-rw------- 1 root root  104 Dec 15 13:50 test_grains.py
drwxr-xr-x 3 root root 4.0K Dec 15 13:50 ..
drwxr-xr-x 2 root root 4.0K Dec 15 13:50 __pycache__
drwx------ 3 root root 4.0K Dec 15 13:50 .
root@david-VirtualBox:/var/cache/salt/minion/extmods/grains#

And this with Salt 3005.4, but confirmed saltutil.sync_grains on Salt 3006.5 wipes it out from salt-master,
Also salt-call --local sets the directory correctly, but without '--local' it gets wiped out, that is, communicating with the salt-master (salt-call without local) causes the directory to be erased

@dmurphy18 dmurphy18 added the Confirmed Salt engineer has confirmed bug/feature - often including a MCVE label Dec 15, 2023
@dmurphy18
Copy link
Contributor

dmurphy18 commented Dec 16, 2023

Suspect the issue is due to bae0321 and not #65574 which didn't change sync_grains, just ensured it was called.

Note: issue exists in Salt 3006.4 which being a CVE implies 3006.3 too.
Tested back to Salt 3006.0 and problem occurs, hence going to try with a different Salt Master version (other than 3006.5) next week.

Only difference between salt-call --local and salt-call is that '--local' does not contact the salt-master.

Investigating further

@higels
Copy link

higels commented Dec 16, 2023

We are also seeing this (well, we were; we've since rolled back to 3006.4).

For better or worse, we sometimes make decisions in salt/top.sls about what states to include for a minion based on a custom grain being set, and these states were not being included under 3006.5.

I didn't have too much time to debug, but some observations:

  1. calling salt-call saltutil.sync_all would imply that the grains were being refreshed on every single invocation (modules et al were not)
  2. removing the call to _sync_grains around L947 in salt/minion.py fixed the errant behaviour.
  3. I don't really understand the meaning of force_local but if it has the same effect as doing salt-call --local saltutil.sync_grains, that completely removes the directory /var/cache/salt/minion/extmods/grains under 3006.4 as well.

If there's other information that is useful to repro, please let me know.

@dmurphy18
Copy link
Contributor

dmurphy18 commented Dec 18, 2023

Duplicated issue with salt-master and salt-minion using Salt 3006.3. This rules out #65186 as the problem since this was a fix introduced in 3006.5

Also Salt 3006.1 and Salt 3006.0 used for salt-master and salt-minion exhibit the issue.

Using Salt 3005.4 for salt-master and salt-minion, the same issue is exhibited, this inclines me to either believe it is a long standing issue unrelated to Salt 3006.x, or functioning as designed.

So Salt 3005.4 tiamat-backend on salt-master and salt-minion exhibits the issue, but classic package Salt 3005.4 salt-master and salt-minion do not exhibit the problem if on the same machine, but if the salt-master is on a different machine, then the issue is exhibited (this is probably more an issue with local salt-master or remote salt-master).

Have to determine if this is normal operation (operating as designed) or an undiscovered issue

@dmurphy18 dmurphy18 added Regression The issue is a bug that breaks functionality known to work in previous releases. and removed Regression The issue is a bug that breaks functionality known to work in previous releases. labels Dec 19, 2023
@CapBananoid
Copy link

@dmurphy18 I'm maybe mistaken with the tests you are doing but I don't see this behavior in 3006.4
and @higels says he rollbacked to 3006.4 as well

@frebib
Copy link
Contributor

frebib commented Dec 19, 2023

3006.5 is entirely unusable for us due to this. We manually saltutil.sync_all before running a highstate (to solve this exact problem), but as of this change our highstate fails due to missing extmods. Moreover, the refresh=true causes unnecessary pillar renders that are particularly noticeable when pillar takes a non-negligible amount of time to generate. Based on the symptoms, I'm going to attempt reverting #65186 as I think that will solve the issue. (Side note: it's impossible to work with PRs that make a change over 14 (!!!!!) commits. Please be kind to maintainers and squash commits before merge)

@Oloremo
Copy link
Contributor

Oloremo commented Dec 19, 2023

(Side note: it's impossible to work with PRs that make a change over 14 (!!!!!) commits. Please be kind to maintainers and squash commits before merge)

+100

@Oloremo
Copy link
Contributor

Oloremo commented Dec 19, 2023

The whole pillar and grain sync\refresh is way overcomplicated in salt. In some places, you can control it in some, you can't(hello, highstate!).

It would be so much simpler if Salt would:

  1. Only sync or refresh pillars and grains if it is told to. Never do it if not asked to.
  2. Have dedicated and non-coupled API calls to refresh grains and pillars individually. Both must have sync and async ways of refreshing.
  3. Not fetch new pillars if not told to. iirc each state.apply does pillar fetching each time - why?

With that implemented we could start each rollout with: sync, refresh, apply and guarantee that it would happen only once.

@dmurphy18
Copy link
Contributor

@Oloremo I cannot see the way sync refresh in Salt changing, certainly not at this point with the reduced team size.

@CapBananoid Here is the test file I am using to test with

root@david-VirtualBox:/srv/salt/_grains# cat test_grains.py 

def test_grains():
    grains = {}
    grains["test_dgm_custom"] = "custom_value_1"
    return grains

root@david-VirtualBox:/srv/salt/_grains#

Very simple and it occurs on all versions of Salt I have tried with, on Ubuntu 22.04 (master), Centos 7 (master) and Ubuntu 20.04 (minion), with all flavors of 3006.x, and 3005.4 tiamat-backed and classic, with the simple commands:

  • salt-call --local saltutil.sync_grains (keeps the cached value /var/cache/salt/minion/extmods/grains/)
  • salt-call saltutil.sync_grains (removes the cached value /var/cache/salt/minion/extmods/grains/)

Interestingly if the salt-master is local (on the same machine) then the cached value is kept (master is not remote being the reason in the code).
Admittedly it would appear to me that you would want the cached value kept regardless of the command being salt-call or from the salt-master, using the testing that I am doing, and I understand where the removal is occurring and it is old code, see https://github.com/saltstack/salt/blame/53c7e6be4f046dac3834126ea65fbffdcfeda769/salt/modules/saltutil.py#L111-L125, the os.remove deletes the cached value, and that code is unchanged for years.

So I may have found a different issue, hence @lomeroe I need a simple example of usage and repeatability for the issue to ensure I am addressing your issue found, in case it is different from what I have found.

@dmurphy18
Copy link
Contributor

@frebib Telling me it is entirely broken without providing examples of the issue and ways to demonstrate the issue, is not helpful in getting the issue fixed. It is open-source,therefore please be helpful in assisting the fixing of issues, just telling me it doesn't work, does not help. I get you are having an issue, otherwise you would not have commented, but I need details in order to fix the issue that you are having, and not second guess that it might be one thing or another.

Resources on the Salt Core team were never plentiful, and have been sorely reduced recently, so any help /assistance is needed to resolve issues.

Not intending to just pick you out, but this applies to all issues going forward.

@lkubb
Copy link
Contributor

lkubb commented Dec 19, 2023

From what I understand, the new issue is that custom grain modules sometimes disappear from /var/cache/salt/minion/extmods/grains even though the minion should be connected to the master. I think it is due to these lines:

salt/salt/minion.py

Lines 133 to 136 in 53c7e6b

if opts.get("file_client", "remote") == "remote" and not opts.get(
"master_uri", None
):
salt.utils.extmods.sync(opts, "grains", force_local=True)

Syncing modules with salt-call --local should wipe custom modules since they are usually not located on the local node. This is an exception for the minion on the master node since - if you have the files in the standard paths - it will still be found by the masterless minion's fileserver.

Just my 5ct, I haven't upgraded to 3006.5 yet.

Edit (clarification): salt-call --local calls other than saltutil.sync_(grains|all) should not wipe custom grains ofc, but I suspect this might be happening with the mentioned patch in addition to the above.

@dmurphy18
Copy link
Contributor

@lkubb I am making it happen on 3005.4 even, so that code was added for 3006.5, so I may have found a different issue in my testing, hence the need for the originators example, or others showing what exactly they are doing to exhibit the problem.

@dmurphy18
Copy link
Contributor

Egg on face or PBKAC, had the custom grain on the minion and not on the master 🤦
Will redo with the correct custom grain setting on the master.

All the more reason to get examples of issue from originators, since core team members can screw it up, esp. after the last couple of weeks 😆

@lkubb
Copy link
Contributor

lkubb commented Dec 19, 2023

@dmurphy18 I cannot reproduce this on 3006.2 or 3006.4, but I can reproduce it by including the _sync_grains behavior in salt.minion.SMinion.

$ cat /srv/salt/_grains/foo.py
def foo():
    return {"foo": "bar"}
$ salt testminion saltutil.sync_all
testminion:
    ----------
    beacons:
    clouds:
    engines:
    executors:
    grains:
        - grains.foo
    log_handlers:
    matchers:
    modules:
    output:
    proxymodules:
    renderers:
    returners:
    sdb:
    serializers:
    states:
    thorium:
    utils:
$ salt testminion grains.get foo
testminion:
    bar

Then switch to testminion:

$ salt-call grains.get foo
local:
$ salt-call saltutil.sync_all
local:
    ----------
    beacons:
    clouds:
    engines:
    executors:
    grains:
        - grains.foo
    log_handlers:
    matchers:
    modules:
    output:
    proxymodules:
    renderers:
    returners:
    sdb:
    serializers:
    states:
    thorium:
    utils:
$ salt-call saltutil.sync_all
local:
    ----------
    beacons:
    clouds:
    engines:
    executors:
    grains:
        - grains.foo
    log_handlers:
    matchers:
    modules:
    output:
    proxymodules:
    renderers:
    returners:
    sdb:
    serializers:
    states:
    thorium:
    utils:
# etc.

It keeps resyncing the module when salt-call is used, but works when called from the master. Note that this is on 3006.2 with just the force_local behavior and _sync_grains in SMinion.

Maybe someone experiencing this issue in 3006.5 can correct me if this is besides the point, just trying to help with communication.

Edit: If this is actually the issue, it might be possible to work around it by setting clean_dynamic_modules to false in the minion configuration.

@dmurphy18
Copy link
Contributor

dmurphy18 commented Dec 19, 2023

So with the custom grains now on the salt-master, noticed that using '--local' on Salt 3006.4 minion also removes the cached value

root@david-VirtualBox:/srv/salt# l /var/cache/salt/minion/extmods/grains/
total 16K
drwxr-xr-x 3 root root 4.0K Dec 19 11:19 ..
-rw------- 1 root root  104 Dec 19 11:19 test_grains.py
drwxr-xr-x 2 root root 4.0K Dec 19 11:34 __pycache__
drwx------ 3 root root 4.0K Dec 19 11:34 .
root@david-VirtualBox:/srv/salt# salt-call saltutil.sync_grains
local:
root@david-VirtualBox:/srv/salt# l /var/cache/salt/minion/extmods/grains/
total 16K
drwxr-xr-x 3 root root 4.0K Dec 19 11:19 ..
-rw------- 1 root root  104 Dec 19 11:19 test_grains.py
drwxr-xr-x 2 root root 4.0K Dec 19 11:42 __pycache__
drwx------ 3 root root 4.0K Dec 19 11:42 .
root@david-VirtualBox:/srv/salt# salt-call --local saltutil.sync_grains
local:
root@david-VirtualBox:/srv/salt# l /var/cache/salt/minion/extmods/grains/
ls: cannot access '/var/cache/salt/minion/extmods/grains/': No such file or directory
root@david-VirtualBox:/srv/salt# salt-call --local test.versions
local:
    Salt Version:
              Salt: 3006.4

Also noticed that it was the upgrade from 3006.4 to 3006.5 that wiped out the cached value, this did not occur with an upgrade from 3006.3 to 3006.4 on the salt-minion.

The issue of salt-call --local saltutil.sync.grains removing the custom grain cache value is correct, since the custom grain doesn't exist on the salt-minion and the salt-master is not contacted due to the '--local'

Will look at why the upgrade to Salt 3006.5 on the salt-minion is wiping the cached value.
In normal operation, the cache value is there, that is, salt tu20 saltutil.sync_grains generates the cached value on the salt-minion tu20

The fix in #65186 needs to be adjusted or removed, investigating how to fix this incomplete fix

@lkubb
Copy link
Contributor

lkubb commented Dec 19, 2023

Imho a simple fix for the issue referenced by the PR would be to document prominently and explicitly that you always need to sync custom modules (once after a change) if you want to use them (including during pillar compilation, this is true when using a master as well) – the exception being custom grain modules in a regular Salt setup (but not in masterless/standalone mode).

In essence, the referenced issue is about saving a single salt-call saltutil.sync_all on masterless minions.

@lomeroe
Copy link
Contributor Author

lomeroe commented Dec 20, 2023

apologies for taking a few days to get back to this...

here's a "scenario 1" test script, as opposed to using a master, this uses an alternative "local" configuration to only require 1 system and not have to configure/setup the master, but the result is the same if you were to setup a master instead...

salt-call --version

# clearing cache to start...
salt-call --local saltutil.clear_cache

# /var/cache/salt/minion/extmods/grains should be empty to start

# create an alternative minion config/fileserver path
mkdir -p /tmp/test/etc/salt
mkdir -p /tmp/test/srv/salt/_grains
mkdir -p /tmp/test/srv/salt/_modules

# setup the alternative minion
cat << EOF > /tmp/test/etc/salt/minion
state_top: top.sls

file_client: local

file_roots:
 base:
   - /tmp/test/srv/salt

EOF

# create a fake grain file
cat << EOF > /tmp/test/srv/salt/_grains/lomeroe.py
def __virtual__():
    """
    Fake grain to return some junk data
    """
    return True


def lomeroe():
    """
    returns the fake grain data
    """
    return {"lomeroe": "has the fake grain"}
EOF


# create a fake module file
cat << EOF > /tmp/test/srv/salt/_modules/mod65692.py
def __virtual__():
    """
    Fake module
    """
    return True


def test():
    """
    a fake salt module method
    """
    return "the custom execution module is there"
EOF

printf "synchronizing grains from /tmp/test/srv/salt/_grains\nthe lomeroe grain should be in the output of sync'd grains\nand mod65692 in the list of sync'd modules"
salt-call --config-dir=/tmp/test/etc/salt saltutil.sync_all
ls -alrt /var/cache/salt/minion/extmods/grains | grep lomeroe.py
if [ $? -ne 0 ]
then
    printf "Expected grain file not found from initial sync, something is horribly wrong\n"
fi

ls -alrt /var/cache/salt/minion/extmods/modules | grep mod65692.py
if [ $? -ne 0 ]
then
    printf "Expected module file not found from initial sync, something is horribly wrong\n"
fi

printf "running grains.get with --local flag\nthis should not delete previously sync'd grains and we should not get any ERROR messages\n"
salt-call --local grains.get lomeroe --output=raw | grep "has the fake grain" > /dev/null
if [ $? -ne 0 ]
then
    printf "ERROR: Expected grain data not returned, this is issue 65692\n"
fi

# secondary check of physical lomeroe.py grain file
ls -alrt /var/cache/salt/minion/extmods/grains | grep lomeroe.py
if [ $? -ne 0 ]
then
    printf "ERROR: Expected grain file not found, this is issue 65692\n"
fi

# show that the extmod modules is not removed
printf "running the custom module still works, as the extmods/modules is not wiped out like grains is\n"
salt-call --local mod65692.test --output=raw

printf "calling 'saltutil.sync_all' with --local is totally expected to wipe out all the previously sync'd custom grains/modules/etc\n"
salt-call --local saltutil.sync_all
printf "all the grains/modules previously sync'd are now gone\n"
ls -alrt /var/cache/salt/minion/extmods | grep grains
if [ $? -eq 0 ]
then
    printf "ERROR: a local sync_all with no configured fileserver path should clear out our extmods\n"
fi
ls -alrt /var/cache/salt/minion/extmods | grep modules
if [ $? -eq 0 ]
then
    printf "ERROR: a local sync_all with no configured fileserver path should clear out our extmods\n"
fi

my 3006.4 output from the test script:

salt-call 3006.4 (Sulfur)
local:
    True
synchronizing grains from /tmp/test/srv/salt/_grains
the lomeroe grain should be in the output of sync'd grains
and mod65692 in the list of sync'd moduleslocal:
    ----------
    beacons:
    clouds:
    engines:
    executors:
    grains:
        - grains.lomeroe
    log_handlers:
    matchers:
    modules:
        - modules.mod65692
    output:
    pillar:
    proxymodules:
    renderers:
    returners:
    sdb:
    serializers:
    states:
    thorium:
    utils:
-rw------- 1 root root  201 Dec 20 13:34 lomeroe.py
-rw------- 1 root root  177 Dec 20 13:34 mod65692.py
running grains.get with --local flag
this should not delete previously sync'd grains and we should not get any ERROR messages
-rw------- 1 root root  201 Dec 20 13:34 lomeroe.py
running the custom module still works, as the extmods/modules is not wiped out like grains is
{'local': 'the custom execution module is there'}
calling 'saltutil.sync_all' with --local is totally expected to wipe out all the previously sync'd custom grains/modules/etc
local:
    ----------
    beacons:
    clouds:
    engines:
    executors:
    grains:
    log_handlers:
    matchers:
    modules:
    output:
    pillar:
    proxymodules:
    renderers:
    returners:
    sdb:
    serializers:
    states:
    thorium:
    utils:
all the grains/modules previously sync'd are now gone

my output from the test script with 3006.5

salt-call 3006.5 (Sulfur)
local:
    True
synchronizing grains from /tmp/test/srv/salt/_grains
the lomeroe grain should be in the output of sync'd grains
and mod65692 in the list of sync'd moduleslocal:
    ----------
    beacons:
    clouds:
    engines:
    executors:
    grains:
    log_handlers:
    matchers:
    modules:
        - modules.mod65692
    output:
    pillar:
    proxymodules:
    renderers:
    returners:
    sdb:
    serializers:
    states:
    thorium:
    utils:
-rw-------. 1 root root 201 Dec 20 19:34 lomeroe.py
-rw-------. 1 root root 177 Dec 20 19:34 mod65692.py
running grains.get with --local flag
this should not delete previously sync'd grains and we should not get any ERROR messages
ERROR: Expected grain data not returned, this is issue 65692
ls: cannot access '/var/cache/salt/minion/extmods/grains': No such file or directory
ERROR: Expected grain file not found, this is issue 65692
running the custom module still works, as the extmods/modules is not wiped out like grains is
{'local': 'the custom execution module is there'}
calling 'saltutil.sync_all' with --local is totally expected to wipe out all the previously sync'd custom grains/modules/etc
local:
    ----------
    beacons:
    clouds:
    engines:
    executors:
    grains:
    log_handlers:
    matchers:
    modules:
    output:
    pillar:
    proxymodules:
    renderers:
    returners:
    sdb:
    serializers:
    states:
    thorium:
    utils:
all the grains/modules previously sync'd are now gone

EDIT TO ADD: also notice output of 3006.5 does not show the grain as being sync'd, even though the "ls" of the /var/cache/salt/minion/extmods/grains folder shows the file exists after the "sync_all" is called

@dmurphy18
Copy link
Contributor

Reverting changes in PR #65186, since incomplete solution, will redo in another PR later, but want to quickly fix the issue, and will address the original issue #65027 with better testing etc.

@jdelic
Copy link
Contributor

jdelic commented Dec 27, 2023

@dmurphy18
Interestingly, I see this issue on a setup with a master enabled and available. My custom grains module is copied from /var/cache/salt/minion/files/base/_grains/ to /var/cache/salt/minion/extmods/grains/ on both the master and the minion on a salt-call saltutil.sync_grains refresh=True. However, while the master's local minion behaves as expected, the second minion will copy the file on the sync, but even a simple salt-call grains.get roles on the remote minion will cause it to remove .../extmods/grains/ completely and fail to report no custom grains.

I don't quite follow if your above analysis would cover such behavior for a minion which can talk to its master successfully, so I thought I'd add my experience here.

jdelic added a commit to jdelic/saltshaker that referenced this issue Dec 31, 2023
@dmurphy18
Copy link
Contributor

Closing this since the custom grains change in Salt 3006.5 has been reverted in #65738 which should be available in 3006.6 when it is released.
The original custom grains issue needs a better fix with more authoritative testing

@gregorg
Copy link

gregorg commented Jan 22, 2024

Facing this issue too, we rolled back to 3006.4

@darkpixel
Copy link
Contributor

3006.6 landed a little bit ago.
My custom grains are hosed up. I have ~15 of them. None of them return anything. As far as I can tell, the minions aren't running them even though the files appear to exist.

They occasionally return data if I run salt '*' saltutil.sync_all from the master, but if I attempt to force them to update via salt '*' saltutil.refresh_grains I get nothing back from grains.get .

If I run salt-call saltutil.sync_all from a minion it shows every single custom grain being synced. Previous versions only showed changes being synced. I can repeatedly run sync_all and it will repeatedly 'install' every single grain.

@lkubb
Copy link
Contributor

lkubb commented Feb 1, 2024

@darkpixel 3006.6 is a CVE release and does not contain the fix for this issue.

@max-arnold
Copy link
Contributor

A patch for 3006.5/6:

{% if grains.saltversion in ["3006.5", "3006.6"] %}
patch:
  pkg.installed

grains_sync_patch:
  file.patch:
    - name: '{{ grains.saltpath }}'
    - source: salt://65738.diff
    - strip: 2
    - require:
        - pkg: patch

restart_salt_minion:
  cmd.run:
    - name: 'salt-call service.restart salt-minion'
    - bg: true
    - onchanges:
      - file: grains_sync_patch
{%- endif %}
65738.diff
diff --git a/salt/fileclient.py b/salt/fileclient.py
index 42e7120aab18..443861cc03fd 100644
--- a/salt/fileclient.py
+++ b/salt/fileclient.py
@@ -45,15 +45,12 @@
 MAX_FILENAME_LENGTH = 255
 
 
-def get_file_client(opts, pillar=False, force_local=False):
+def get_file_client(opts, pillar=False):
     """
     Read in the ``file_client`` option and return the correct type of file
     server
     """
-    if force_local:
-        client = "local"
-    else:
-        client = opts.get("file_client", "remote")
+    client = opts.get("file_client", "remote")
 
     if pillar and client == "local":
         client = "pillar"
diff --git a/salt/minion.py b/salt/minion.py
index 0c5c77a91e50..15d46b2dacf9 100644
--- a/salt/minion.py
+++ b/salt/minion.py
@@ -114,29 +114,6 @@
 # 6. Handle publications
 
 
-def _sync_grains(opts):
-    # need sync of custom grains as may be used in pillar compilation
-    # if coming up initially and remote client, the first sync _grains
-    # doesn't have opts["master_uri"] set yet during the sync, so need
-    # to force local, otherwise will throw an exception when attempting
-    # to retrieve opts["master_uri"] when retrieving key for remote communication
-    # in addition opts sometimes does not contain extmod_whitelist and extmod_blacklist
-    # hence set those to defaults, empty dict, if not part of opts, as ref'd in
-    # salt.utils.extmod sync function
-    if opts.get("extmod_whitelist", None) is None:
-        opts["extmod_whitelist"] = {}
-
-    if opts.get("extmod_blacklist", None) is None:
-        opts["extmod_blacklist"] = {}
-
-    if opts.get("file_client", "remote") == "remote" and not opts.get(
-        "master_uri", None
-    ):
-        salt.utils.extmods.sync(opts, "grains", force_local=True)
-    else:
-        salt.utils.extmods.sync(opts, "grains")
-
-
 def resolve_dns(opts, fallback=True):
     """
     Resolves the master_ip and master_uri options
@@ -944,7 +921,6 @@ def __init__(self, opts, context=None):
         # Late setup of the opts grains, so we can log from the grains module
         import salt.loader
 
-        _sync_grains(opts)
         opts["grains"] = salt.loader.grains(opts)
         super().__init__(opts)
 
diff --git a/salt/utils/extmods.py b/salt/utils/extmods.py
index 6a4d5c14440c..a7dc265476f5 100644
--- a/salt/utils/extmods.py
+++ b/salt/utils/extmods.py
@@ -39,7 +39,6 @@ def sync(
     saltenv=None,
     extmod_whitelist=None,
     extmod_blacklist=None,
-    force_local=False,
 ):
     """
     Sync custom modules into the extension_modules directory
@@ -83,9 +82,7 @@ def sync(
                         "Cannot create cache module directory %s. Check permissions.",
                         mod_dir,
                     )
-            with salt.fileclient.get_file_client(
-                opts, pillar=False, force_local=force_local
-            ) as fileclient:
+            with salt.fileclient.get_file_client(opts) as fileclient:
                 for sub_env in saltenv:
                     log.info("Syncing %s for environment '%s'", form, sub_env)
                     cache = []

@darkpixel
Copy link
Contributor

Yeah. I feel kinda stupid. I got all excited that a batch of issues I've been having with the last 3 versions of Salt were finally going to be solved. Then I looked at the changelog. ;)

@dmurphy18
Copy link
Contributor

the breakage in Salt 3006.5 has been rolled back and normal grains should be available in the next bug release which will be soon, had to get the CVE release out first

eg-ayoub added a commit to scality/metalk8s that referenced this issue Feb 16, 2024
saltstack/salt#65692
temporarily until the fix for this bug is released in 3006.7
eg-ayoub added a commit to scality/metalk8s that referenced this issue Feb 28, 2024
saltstack/salt#65692
temporarily until the fix for this bug is released in 3006.7
eg-ayoub added a commit to scality/metalk8s that referenced this issue Feb 29, 2024
saltstack/salt#65692
temporarily until the fix for this bug is released in 3006.7
eg-ayoub added a commit to scality/metalk8s that referenced this issue Mar 4, 2024
saltstack/salt#65692
temporarily until the fix for this bug is released in 3006.7
eg-ayoub added a commit to scality/metalk8s that referenced this issue May 15, 2024
saltstack/salt#65692
temporarily until the fix for this bug is released in 3006.7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE
Projects
None yet
Development

No branches or pull requests