Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] file.replace and file.search false change when looking for // #60055

Open
Vaarlion opened this issue Apr 20, 2021 · 8 comments
Open

[BUG] file.replace and file.search false change when looking for // #60055

Vaarlion opened this issue Apr 20, 2021 · 8 comments
Labels
Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@Vaarlion
Copy link
Contributor

Vaarlion commented Apr 20, 2021

Description
I have a file i wish to fully comment. The comment flag for it is //.
for this i used this state

comment_file
  file.comment:
    - name: /etc/apt/apt.conf.d/50unattended-upgrades
    - regex: ^(?!//)(.*)$
    - char: '//'

my file now look like this

vaarlion@salt2:~$ head /etc/apt/apt.conf.d/50unattended-upgrades
// Unattended-Upgrade::Origins-Pattern controls which packages are
// upgraded.
//
// Lines below have the format format is "keyword=value,...".  A
// package will be upgraded only if the values in its metadata match
// all the supplied keywords in a line.  (In other words, omitted
(...)

It worked well but always reported a future change when running in test=true. So i started digging.
states.file.comment call modules.file.search, which is a wrapper for modules.files.replace
modules.files.replace use re.search when search_only is True, which is what is used by modules.file.search.
So i tested it locally.

>>> cpattern = re.compile(str.encode('^(?!//)(.*)$'), 8)
>>> with open('/etc/apt/apt.conf.d/50unattended-upgrades', 'rb') as r_file:
...     r_data = mmap.mmap(r_file.fileno(), 0, access=mmap.ACCESS_READ)
...     re.search(cpattern, r_data)
... 
<_sre.SRE_Match object; span=(4313, 4313), match=b''>

since it does return something i tried to run the other part of modules.files.replace that use re.subn

>>> cpattern = re.compile(str.encode('^(?!//)(.*)$'), 8)
>>> with open('/etc/apt/apt.conf.d/50unattended-upgrades', 'rb') as r_file:
...     r_data = mmap.mmap(r_file.fileno(), 0, access=mmap.ACCESS_READ)
...     result, nreply = re.subn(cpattern,b"",r_data,0)
... 
>>> nreply
1
>>> result
b'// Unattended-Upgrade::Origins-Pattern controls which packages are\n// upgraded.\n//\n// Lines below have the format format is "keyword=value,...".  A\n// package will be upgraded only if the values in its metadata match\n// all the supplied keywords in a line.  (In other words, omitted\n// keywords are wild cards.) The keywords originate from the Release\n// file, but several aliases are accepted.  The accepted keywords are:\n//   a,archive,suite (eg, "stable")\n//   c,component     (eg, "main", "contrib", "non-free")\n//   l,label         (eg, "Debian", "Debian-Security")\n//   o,origin        (eg, "Debian", "Unofficial Multimedia Packages")\n (...)

The output is identical, but there was a match. So i printed r_data to make a diff.

>>> r_data.read()
b' Unattended-Upgrade::Origins-Pattern controls which packages are\n// upgraded.\n//\n// Lines below have the format format is "keyword=value,...".  A\n// package will be upgraded only if the values in its metadata match\n// all the supplied keywords in a line.  (In other words, omitted\n// keywords are wild cards.) The keywords originate from the Release\n// file, but several aliases are accepted.  The accepted keywords are:\n//   a,archive,suite (eg, "stable")\n//   c,component     (eg, "main", "contrib", "non-free")\n//   l,label         (eg, "Debian", "Debian-Security")\n//   o,origin        (eg, "Debian", "Unofficial Multimedia Packages")\n (...)

it look like when loading the file in memory, the first 2 // are missing, and that is what re match . My guess is that there is something wrong with the way we work with the file, but this is over my reach.

One weird thing that could counter this idea is that when i try to match something that doesn't exist, the output of re.subn still contain the first 2 //, while not present in r_data

>>> cpattern = re.compile(str.encode('^ThisWontMatch$'), 8)
>>> with open('/etc/apt/apt.conf.d/50unattended-upgrades', 'rb') as r_file:
...     r_data = mmap.mmap(r_file.fileno(), 0, access=mmap.ACCESS_READ)
...     result, nreply = re.subn(cpattern,b"",r_data,0)
... 
>>> nreply
0
>>> result
b'// Unattended-Upgrade::Origins-Pattern controls which packages are\n// upgraded.\n//\n// Lines below have the format format is "keyword=value,...".  A\n// package will be upgraded only if the values in its metadata match\n// all the supplied keywords in a line.  (In other words, omitted\n// keywords are wild cards.) The keywords originate from the Release\n// file, but several aliases are accepted.  The accepted keywords are:\n//   a,archive,suite (eg, "stable")\n//   c,component     (eg, "main", "contrib", "non-free")\n//   l,label         (eg, "Debian", "Debian-Security")\n//   o,origin        (eg, "Debian", "Unofficial Multimedia Packages")\n (...)
>>> r_data.read()
b' Unattended-Upgrade::Origins-Pattern controls which packages are\n// upgraded.\n//\n// Lines below have the format format is "keyword=value,...".  A\n (...)

Setup
simplified setup

vaarlion@salt2:~$ cat test
// toto
//tata
// tutu

Steps to Reproduce the behavior

vaarlion@salt2:~$ python3
Python 3.5.3 (default, Apr  5 2021, 09:00:41) 
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> import mmap
>>> cpattern = re.compile(str.encode('^(?!//)(.*)$'), 8)
>>> with open('test', 'rb') as r_file:
...     r_data = mmap.mmap(r_file.fileno(), 0, access=mmap.ACCESS_READ)
...     result, nreply = re.subn(cpattern,b"",r_data,0)
...     re.search(cpattern, r_data)
... 
<_sre.SRE_Match object; span=(23, 23), match=b''>
>>> nreply
1
>>> result
b'// toto\n//tata\n// tutu\n'
vaarlion@salt2:~$ sudo salt-call file.replace test '^(?!//)(.*)$' "" flags='[MULTILINE]' bufsize=file dry_run=True show_changes=False
local:
    False
vaarlion@salt2:~$ sudo salt-call file.replace test '^(//)(.*)$' "" flags='[MULTILINE]' bufsize=file dry_run=True show_changes=False
local:
    True
vaarlion@salt2:~$ sudo salt-call file.replace test '^(?!//)(.*)$' "" flags='[MULTILINE]' bufsize=file dry_run=True show_changes=False search_only=True
local:
    True
vaarlion@salt2:~$ sudo salt-call file.replace test '^(//)(.*)$' "" flags='[MULTILINE]' bufsize=file dry_run=True show_changes=False search_only=True
local:
    True

Expected behavior
modules.file.replace in search_only and modules.file.search should not return True when nothing need to be changed

Screenshots
If applicable, add screenshots to help explain your problem.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3003
 
Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: 2.5.3
     docker-py: Not Installed
         gitdb: 2.0.0
     gitpython: 2.1.1
        Jinja2: 2.11.3
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: 1.0.6
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: 2.6.1
  pycryptodome: 3.10.1
        pygit2: Not Installed
        Python: 3.5.3 (default, Apr  5 2021, 09:00:41)
  python-gnupg: Not Installed
        PyYAML: 5.3.1
         PyZMQ: 20.0.0
         smmap: 2.0.1
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.3
 
System Versions:
          dist: debian 9 stretch
        locale: UTF-8
       machine: x86_64
       release: 4.9.0-15-amd64
        system: Linux
       version: Debian GNU/Linux 9 stretch

Additional context
Add any other context about the problem here.

@Vaarlion Vaarlion added Bug broken, incorrect, or confusing behavior needs-triage labels Apr 20, 2021
@danielrobbins danielrobbins self-assigned this Apr 20, 2021
@danielrobbins
Copy link

danielrobbins commented Apr 20, 2021

I have reproduced your issue. I created a salt state with your exact file.comment state, but operating on my own test file. As you indicate, file.comment does work correctly and comment out all the lines when you run it the first time. This is not the problem. The problem is on successive runs when the file really needs no changes. When I run with test=true, We get the message "File /var/tmp/comment is set to be updated":

# salt xps state.apply comment-state test=true
xps:
----------
          ID: comment_file
    Function: file.comment
        Name: /var/tmp/comment
      Result: None
     Comment: File /var/tmp/comment is set to be updated
     Started: 12:45:35.193734
    Duration: 5.703 ms
     Changes:   
              ----------
              /var/tmp/comment:
                  updated

Summary for xps
------------
Succeeded: 1 (unchanged=1, changed=1)
Failed:    0
------------
Total states run:     1
Total run time:   5.703 ms

We should instead get a message "Pattern already commented".

I can also see that when the state has already been applied, running it without test=true gives us a 'bad' response, indicating a change was made, but no change was made, because the file was already good:

# salt xps state.apply comment-state
xps:
----------
          ID: comment_file
    Function: file.comment
        Name: /var/tmp/comment
      Result: True
     Comment: Commented lines successfully
     Started: 12:41:23.950437
    Duration: 6.502 ms
     Changes:   

Summary for xps
------------
Succeeded: 1
Failed:    0
------------
Total states run:     1
Total run time:   6.502 ms

I believe both these problems are related and are likely due to the following lines in salt/salt/states/file.py:comment() method not working correctly, at least for your regex:

    # Make sure the pattern appears in the file before continuing
    if not __salt__["file.search"](name, regex, multiline=True):
        if __salt__["file.search"](name, comment_regex, multiline=True):
            ret["comment"] = "Pattern already commented"
            ret["result"] = True
            return ret
        else:
            return _error(ret, "{}: Pattern not found".format(unanchor_regex))

If this code doesn't intercept the 'no changes' case, it will flow to the code underneath, which assumes there are changes to be made, and then will return values indicating that the changes were made successfully. I will investigate further.

@danielrobbins
Copy link

danielrobbins commented Apr 20, 2021

OK, in attempting to narrow down the problem, I've determined file.comment works 100% fine with a simpler regex such as "^foobar" and your comment string '//' -- running again correctly indicates no changes will be made.

So file.comment works perfectly for simpler scenarios, but not 100% perfect for your specific test case.

I think it is very likely that the state is not properly handling your more complex regex which contains a negative lookahead "(?!...)". I believe that this is the issue.

This issue can be caused by a couple of things. First, there is a pre-processing of the regex that happens inside the comment state itself (see unanchor_regex) -- this could be mangling the negative lookahead and causing it to be weird. The other possibility is that the file.search exec function you referenced is not handling the negative lookahead properly.

More investigation required to determine root cause.

@danielrobbins
Copy link

danielrobbins commented Apr 22, 2021

I think I found the issue, and it has to do with your regex.

^(?!//)(.*)$ will actually have one match, an empty string -- as long as the file exists. This is what is causing the strange behavior of your regex.

I think you want to replace * with +, which will indicate that your negative lookahead has to be on a line with at least one character on it. Try it and see. I have also written this sample program to demonstrate:

#!/usr/bin/python3

import re

def reptest(pat):
    with open('/var/tmp/comment', 'r') as testfile:
        data = testfile.read()
        cpattern=re.compile(pat, flags=re.MULTILINE)
        result, nrepl = re.subn(cpattern, "", data, count=0)
        print("pat     > ", pat)
        print("orig    > ", repr(data))
        print("result  > ", repr(result))
        print("matches > ", nrepl)
        print("match   > ", re.search(cpattern, data))
        print()

reptest('^(?!//)(.*)$')
reptest('^(?!//)(.+)$')
reptest('^foobar')

So this appears to be a quirk of how your specific regex works, which is confusing. Let me know your results with the minor regex change and if now things work as expected.

You will see that even though your regex makes no changes to the data, it actually registers a match! This is why it's so confusing. Possibly we could test against this specific case by looking to see if the replacement data is any different from the new data, rather than looking for number of matches, which is problematic, since some matches may be 'null' matches like this.

@Vaarlion
Copy link
Contributor Author

Vaarlion commented Apr 23, 2021

Thank you for taking the time to look at this @danielrobbins !
I did try my best to rule out regex issue before posting a bug here, and i though i had something when finding that weird data thing.

I'm guessing it matched the last "empty" line of any file , but i would have expected the file.comment state i use to append // at the end of the file for every run then.

It no longer return wrong change by changing the regex, but it also won't add comment to empty line in the middle of the text block.
It isn't important in my case but might be in other. Good to know.

Once again sorry for the false bug report

@danielrobbins
Copy link

@Vaarlion you are very welcome! This is actually a weird behavior of the python regex implementation, which is also discussed here: rust-lang/regex#355

It may actually be a good idea to look into some fix/tweak for this so salt users don't have to deal with this strange peculiarity of the multiline regex implementation in Python. I would like to look into the possibility of making this better and getting your original regex working correctly -- because I think others may write your regex expecting the result you expected, but also be surprised by this behavior. So I am going to re-open the issue and will continue to look into it.

@danielrobbins danielrobbins reopened this Apr 23, 2021
@danielrobbins
Copy link

I believe this change to the source code will resolve the issue. It works perfectly in my limited testing and causes your original regex to be treated as expected by salt:

diff --git a/salt/modules/file.py b/salt/modules/file.py
index 6ea81e70fb..e70a6849f3 100644
--- a/salt/modules/file.py
+++ b/salt/modules/file.py
@@ -2367,6 +2367,11 @@ def replace(
     no changes are made.
 
     This is a pure Python implementation that wraps Python's :py:func:`~re.sub`.
+    An assumption is made that the string you are trying to replace is not an
+    'empty' string, and your pattern should evaluate to a non-zero-length string
+    when matched. Python's regular expression engine can return zero-length
+    'matches' when using certain patterns, and we intentionally ignore these to
+    avoid false positives.
 
     path
         Filesystem path to the file to be edited. If a symlink is specified, it
@@ -2553,11 +2558,11 @@ def replace(
                 # size of file in /proc is 0, but contains data
                 r_data = salt.utils.stringutils.to_bytes("".join(r_file))
             if search_only:
-                # Just search; bail as early as a match is found
-                if re.search(cpattern, r_data):
-                    return True  # `with` block handles file closure
-                else:
-                    return False
+                # Search for first non-empty match, and if found, return True. Otherwise, False.
+                for s_match in re.finditer(cpattern, r_data):
+                    if len(s_match.group(0)):
+                        return True
+                return False
             else:
                 result, nrepl = re.subn(
                     cpattern,

@sagetherage sagetherage added the severity-high 2nd top severity, seen by most users, causes major problems label Apr 26, 2021
@sagetherage sagetherage added this to the Approved milestone Apr 26, 2021
@sagetherage sagetherage added severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around and removed severity-high 2nd top severity, seen by most users, causes major problems labels Apr 26, 2021
@danielrobbins
Copy link

Please note that my fix above needs further testing. I hope it will work for everyone and not introduce any unexpected changes of behavior for existing users of this function who may want a zero-length match to show up a match. I think it's unlikely that people want replace() to do that, but you never know...

@Vaarlion
Copy link
Contributor Author

thank you for looking into it danielrobbins.
Quick question, does a search for an empty line count as a zero-lenght match ? or does it have a length of 2 since it include the new line escape \n?

I can quickly think of a case where you want to fill empty line between 2 block of text for example. or remove them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests

3 participants