Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] file.managed and file.symlink both run in test mode by default #64195

Closed
2 tasks done
zeddD1abl0 opened this issue Apr 30, 2023 · 36 comments
Closed
2 tasks done

[BUG] file.managed and file.symlink both run in test mode by default #64195

zeddD1abl0 opened this issue Apr 30, 2023 · 36 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior Regression The issue is a bug that breaks functionality known to work in previous releases. Sulfur v3006.0 release code name and version

Comments

@zeddD1abl0
Copy link

Description
During a basic file.managed or file.symlink, running salt state.apply results in "symlink is set for creation" or "The file is set to be changed. Note: No changes have been applied."

Even explicitly setting "test" in the minion configuration to False did not fix the issue. The only fix is to manually intervene and hope that the files line up as expected.

Setup

/opt/netbox:
  file.symlink:
    - target: /opt/netbox-{{ netboxversion }}
    - force: True
    - test: False
    - require:
      - archive: /opt

/opt/netbox/local_requirements.txt:
  file.managed:
    - source: salt://filesets/netbox/local_requirements.txt
    - require:
      - file: /opt/netbox

Please be as specific as possible and give set-up details.

  • VM (ProxMox)
  • onedir packaging

Steps to Reproduce the behavior
With the given symlink and managed examples above, simply running salt state.apply fails. The same is true of a managed file

Expected behavior
Prior to 3006, when running salt state.apply the file would be changed or updated as per the SLS definition.

Screenshots
image

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3006.0
 
Python Version:
        Python: 3.10.11 (main, Apr 14 2023, 05:57:16) [GCC 11.2.0]
 
Dependency Versions:
          cffi: 1.14.6
      cherrypy: unknown
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: 4.0.10
     gitpython: 3.1.31
        Jinja2: 3.1.2
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: 0.38.0
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.9.8
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 5.4.1
         PyZMQ: 23.2.0
        relenv: 0.11.2
         smmap: 5.0.0
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4
 
System Versions:
          dist: almalinux 8.7 Stone Smilodon
        locale: utf-8
       machine: x86_64
       release: 4.18.0-425.19.2.el8_7.x86_64
        system: Linux
       version: AlmaLinux 8.7 Stone Smilodon
@zeddD1abl0 zeddD1abl0 added Bug broken, incorrect, or confusing behavior needs-triage labels Apr 30, 2023
@welcome
Copy link

welcome bot commented Apr 30, 2023

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar.
If you have additional questions, email us at [email protected]. We’re glad you’ve joined our community and look forward to doing awesome things with you!

@OrangeDog
Copy link
Contributor

What is the output of salt 'minion' config.get test?

@OrangeDog OrangeDog added the info-needed waiting for more info label May 2, 2023
@zeddD1abl0
Copy link
Author

zeddD1abl0 commented May 2, 2023

image

As per the screenshot, a value of False is returned.

Edit: Another thing I should mention is, some other file.managed and file.symlinks are working, but some aren't. There doesn't appear to be much to link the ones that don't together, except perhaps that they are located within areas chown by root:root. I don't quite have time to fully follow that up tonight (AEST Time), but if I get a chance in the near future I'll double-check and confirm if it's specifically related to just root:root areas.

@OrangeDog
Copy link
Contributor

Do you have any custom code that might be putting a run into test mode accidentally?

@zeddD1abl0
Copy link
Author

zeddD1abl0 commented May 3, 2023

@OrangeDog I don't think so. There's nothing too spectacular in my configuration really. I'm using a fairly simple Salt install with a simple git repo, etc. if I create the symlink or file by hand, it takes a few refreshes, and then it finally sticks. If I then remove the symlink/file it goes back to "Unchanged". All these snippets are in the same SLS file:

/opt/netbox:
  file.symlink:
    - target: /opt/netbox-{{ netboxversion }}
    - force: True
    - require:
      - archive: /opt

/opt/netbox/gunicorn.py:
  file.managed:
    - source: salt://filesets/netbox/gunicorn.py
    - require:
      - file: /opt/netbox

/opt/netbox/netbox/media:
  file.directory:
    - user: netbox
    - group: netbox
    - recurse:
      - user
      - group

/opt/netbox/local_requirements.txt:
  file.managed:
    - source: salt://filesets/netbox/local_requirements.txt
    - require:
      - file: /opt/netbox

/opt/netbox/netbox/netbox/configuration.py:
  file.managed:
    - source: salt://filesets/netbox/configuration.py
    - template: jinja
    - require:
      - file: /opt/netbox

That's copy paste verbatim from the sls file, with some intervening states, but in the same sequence in the file. I'm fairly sure the sequence doesn't affect it, but just in case.

/opt/netbox and /opt/netbox/netbox/configuration.py both go to unchanged. But, if /opt/netbox is manually put in place, the others are successful.

Edit: I should add

{% set netboxversion = "3.5.0" %}

Just for clarity sake. That's the only variable in the SLS. From what I've tested, template doesn't affect it either, two template files worked perfectly in the same state as this one.

@OrangeDog
Copy link
Contributor

You're saying other states in the same run are applying changes, but these two and only these two are in test mode?

@zeddD1abl0
Copy link
Author

Yup. Everything else in the SLS for these actions applies properly. Creating files, restarting services, etc. The whole thing. It's only those two specific states that appear to be in "Test" mode. And I have NO idea how that works. TBH, I don't bother with Test mode. I just apply state until steady.

@OrangeDog
Copy link
Contributor

"it takes a few refreshes, and then it finally sticks", and "until steady" do not make sense to me.

After applying the state once, the system is then in that state. If that is not true then there is something else wrong.

@OrangeDog
Copy link
Contributor

OrangeDog commented May 3, 2023

This may be related to #55269, if __opts__["test"] is not True, but the state is incorrectly returning a None result now with changes. @lkubb is that possible?

@zeddD1abl0
Copy link
Author

So, I removed the symlink earlier to confirm it's still doing weird stuff, then I re-added the symlink manually. Here is the result:

netbox01:

image

salt master:

image

This is what I mean by "it finally sticks". When I first ran it tonight, it told me that the state matches. Of course it matched. The symlink was there from last night. Manually, I go and I remove the symlink, and the state dies. I go and manually create the symlink, and it still thinks there's a problem. Previously, in 3005, if I manually intervened because of something, Salt would simply go "Hey, it's here. Awesome". Now it appears to be caching the state or the files or something and then getting horribly confused when I manually take an action in the SLS file.

In the course of writing this comment, I reran the state.apply and got this result 13 minutes later:

image

The symlink exists. I took a screenshot. And yet, Salt appears to be confused because it does not want to apply a state that is already in place and functional.


By "until steady", what I mean is, sometimes I'm a little overzealous with some of my states, and they take a second or a third pass before the are 100% ready. It's because I'm still learning, and I don't want to lean too heavily on the "before","after","require" pieces of Salt. I'm used to this, so I put up with it for some things. I have a whole SLS to deploy ElasticSearch and Kibana from scratch. It works perfectly... after the second or third state.apply because it's got too many moving pieces for them to all line up in the first state.apply. So, I mash state.apply, wait 15 minutes, it tells me it had 14 errors. I reapply state.apply and it comes back clean.

@lkubb
Copy link
Contributor

lkubb commented May 3, 2023

This may be related to #55269, if __opts__["test"] is not True, but the state is incorrectly returning a None result now with changes. @lkubb is that possible?

I had the same intuition, but I don't see how this would be related since that PR only affected failing runs. I think I have seen similar behavior as described in here since including [a patched variant of] the current [3006] file modules in my custom modules, but not from only applying this patch before. The behavior is only present during a highstate and does not show itself when applying the same state tree with state.sls.

@zeddD1abl0
Copy link
Author

So, as a question that might sound a bit stupid, what should I look out for to confirm its not just something stupid with my setup? Is there anything obvious to look for? Anything I should double check? I'm out of time today, but I can check anything and everything tomorrow. It's not even outside the realm to rebuild the Salt server if needed.

@zeddD1abl0
Copy link
Author

It has to be a problem with my configuration. I now have the following

image

Which is insane. I can't imagine that Salt would've released 3006 without finding this, but. I can't figure out how or where. Need to figure that out now I guess...

@anilsil anilsil added this to the Sulfur v3006.2 milestone May 10, 2023
@ymasson
Copy link
Contributor

ymasson commented May 10, 2023

Same problem after 3004.2 to 3006.1 upgrade of salt-minion.

Name: /etc/haproxy/ssl/xxxxx.pem - Function: file.managed - Result: Differs - Started: 13:03:39.850351 - Duration: 31.351 ms
Name: haproxy - Function: service.running - Result: Changed - Started: 13:03:40.643770 - Duration: 59.438 ms

comment of file.managed :

        comment:
            The file /etc/haproxy/ssl/xxxxx.pem is set to be changed
            Note: No changes made, actual changes may
            be different due to other states.

The file is not updated. The service reload like it was updated.

the state:

haproxy_xxxxx_letsencrypt_certificates:
  file.managed:
    - name: /etc/haproxy/ssl/xxxxx.pem
    - source: https://xxxxx/letsencrypt_xxxxx.pem
    - source_hash: https://xxxxx/letsencrypt_xxxxx.pem.hash
    - user: haproxy
    - group: haproxy
    - mode: '0400'
    - watch_in:
      - haproxy_service
      
haproxy_service:
  service.running:
    - name: haproxy
    - enable: True
    - reload: True

salt version: 3006.1

@ymasson
Copy link
Contributor

ymasson commented May 10, 2023

in the same formula, i have few other file.managed state.
like

haproxy_apt_preference:
  file.managed:
    - name: /etc/apt/preferences.d/haproxy
    - user: root
    - group: root
    - mode: '0644'
    - contents: |
        Package: haproxy*
        Pin: origin haproxy.debian.net
        Pin-Priority: 600

or using - source: salt://xxx
they are not affected.

it seems affecting only HTTP sources.

@lkubb
Copy link
Contributor

lkubb commented May 10, 2023

Just a hunch since I wrote and use those: Does anyone affected by this problem not use the x509_v2 modules? They do call the file module in test mode during regular operation, maybe that's confusing the loader somewhere.

@ymasson
Copy link
Contributor

ymasson commented May 10, 2023

I use the x509_v2 module

@OrangeDog OrangeDog added Regression The issue is a bug that breaks functionality known to work in previous releases. Sulfur v3006.0 release code name and version and removed info-needed waiting for more info labels May 10, 2023
@zeddD1abl0
Copy link
Author

@lkubb That is plausible. I use the new x509_v2 module as well. I can try it without the x509_v2 module relatively easily when I get time.

@lkubb
Copy link
Contributor

lkubb commented May 10, 2023

Update: I can reproduce this behavior when the x509_v2 state module is called at least twice in a state tree before a file.managed call. If the last x509.*_present call before file.managed does not report changes, all subsequent file.managed calls run in test mode. This behavior is not present with only a single x509.*_present call, regardless of outcome.

So it seems the workaround for #62590 (https://github.com/saltstack/salt/blob/0cb3dc87e780b514b07ac0da66906a085893ceb2/salt/states/x509_v2.py#L1576C4-L1582) has the potential to trigger an inverse bug in the loader, which is much worse. I will code up a fix now that I have a test for it.

@lkubb
Copy link
Contributor

lkubb commented May 11, 2023

@ymasson @zeddD1abl0 Sorry for the delay. Would you be able to test if this change fixes your issues once CI passes?

Edit: Adapted the previous commit a bit for completeness' sake, should work regardless.

@OrangeDog
Copy link
Contributor

which is much worse

IMO changing it and not telling you is worse than not changing it and telling you.

@lkubb
Copy link
Contributor

lkubb commented May 11, 2023

IMO changing it and not telling you is worse than not changing it and telling you.

Point taken, thanks for encouraging. :) Not changing it can lead to manifested breakage though and this interaction could be described as spooky action a distance for anyone not familiar with the relevant codebase. I have stumbled on the loader in the past by treating it as a context manager, hope it becomes more robust at some point.

Anyways, CI is looking mostly fine and my states still work, so the commit should be good to test.

@ymasson
Copy link
Contributor

ymasson commented May 11, 2023

@lkubb your patch work. no worries for the delay. thanks

@zeddD1abl0
Copy link
Author

zeddD1abl0 commented May 27, 2023

@ymasson @zeddD1abl0 Sorry for the delay. Would you be able to test if this change fixes your issues once CI passes?

Edit: Adapted the previous commit a bit for completeness' sake, should work regardless.

Hi @lkubb ,

I tried to patch it... I think it did it right. I found the file in /opt/saltstack (/opt/saltstack/salt/lib/python3.10/site-packages/salt/states/x509_v2.py) and changed the file to match. Is that what I was supposed to do?

If so, unfortunately, it's still failing in the same way.

image

@zeddD1abl0
Copy link
Author

@ymasson @zeddD1abl0 Sorry for the delay. Would you be able to test if this change fixes your issues once CI passes?
Edit: Adapted the previous commit a bit for completeness' sake, should work regardless.

Hi @lkubb ,

I tried to patch it... I think it did it right. I found the file in /opt/saltstack (/opt/saltstack/salt/lib/python3.10/site-packages/salt/states/x509_v2.py) and changed the file to match. Is that what I was supposed to do?

If so, unfortunately, it's still failing in the same way.

image

My apologies @lkubb , I didn't realise it was a minion-side issue.

After applying the patch to a minion suffering the issue, it is indeed fixed. I concur with @ymasson .

@lkubb
Copy link
Contributor

lkubb commented May 28, 2023

@zeddD1abl0 No worries, glad to hear it fixes the issue for you as well. :) As an aside, it might be a bit easier for you to just copy the patched module into a _modules directory on your fileserver and sync it to all minions.

@lmf-mx
Copy link
Contributor

lmf-mx commented Jun 29, 2023

file.directory still runs in test with #64269

@CodingMinds
Copy link

I've the same issue with salt-ssh and can confirm that applying #64269 solved it.
For others using salt-ssh as well: Don't forget --regen-thin to apply those changes

@lkubb
Copy link
Contributor

lkubb commented Jul 6, 2023

@lmf-mx Let's continue here. You wrote:

Confirmed that the minion cache has loaded the patched module.
Some additional details, which might add to things not caught by the test

  • There's a csr_managed
  • There's a private_key_managed
  • There's a file.directory
  • The csr_managed has a require requisite on the private_key_managed, which in turn has a require requisite on the file.directory
  • There is a unrelated file.managed in the same state file

Am I understanding you correctly? When applying the described state,

  1. the state run order is file.directory > x509.private_key_managed > x509.csr_managed > file.managed,
  2. the state.apply is not run with test=true,
  3. file.directory does run in test mode,
  4. the following x509 calls do not run in test mode,
  5. file.managed does not run in test mode.

Is there any prereq that would cause the x509_v2 module to be called before file.directory? Do you apply just this state file or are there other ones with x509_v2 calls that would run before? Is this issue reproducible when just applying the described state file? Do the x509 calls report changes when your described issue occurs?

@danielbakken
Copy link

Has this issue been resolved by merging #64960?

@lkubb
Copy link
Contributor

lkubb commented Aug 17, 2023

I believe so, at least my reproduction test passed on that PR. Mind that it is not part of any release currently.

@danielbakken
Copy link

The PR and this bug were slated for 3006.2, but they're not mentioned in the release notes (https://docs.saltproject.io/en/latest/topics/releases/3006.2.html).

When can we expect a Salt release with a fix for this bug?

@whytewolf
Copy link
Collaborator

@danielbakken 3006.2 was a cve only release. we are working on getting 3006.3 out with this and other fixes.

@Ch3LL
Copy link
Contributor

Ch3LL commented Sep 6, 2023

Closing as this was fixed in #64960 Will be included in the 3006.3 release.

@Ch3LL Ch3LL closed this as completed Sep 6, 2023
@darkpixel
Copy link
Contributor

Any idea when 3006.3 is going to drop @Ch3LL?
Between this fixed bug, a fixed bug in winpkg.latest and a fixed bug in x509 our infrastructure is an unmanageable mess right now. ;)

@darkpixel
Copy link
Contributor

Oops. My bad. It did drop. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Regression The issue is a bug that breaks functionality known to work in previous releases. Sulfur v3006.0 release code name and version
Projects
None yet
Development

Successfully merging a pull request may close this issue.