[BUG] file.replace and file.search false change when looking for `//` #60055

Vaarlion · 2021-04-20T09:16:54Z

Description
I have a file i wish to fully comment. The comment flag for it is //.
for this i used this state

comment_file
  file.comment:
    - name: /etc/apt/apt.conf.d/50unattended-upgrades
    - regex: ^(?!//)(.*)$
    - char: '//'

my file now look like this

vaarlion@salt2:~$ head /etc/apt/apt.conf.d/50unattended-upgrades
// Unattended-Upgrade::Origins-Pattern controls which packages are
// upgraded.
//
// Lines below have the format format is "keyword=value,...".  A
// package will be upgraded only if the values in its metadata match
// all the supplied keywords in a line.  (In other words, omitted
(...)

It worked well but always reported a future change when running in test=true. So i started digging.
states.file.comment call modules.file.search, which is a wrapper for modules.files.replace
modules.files.replace use re.search when search_only is True, which is what is used by modules.file.search.
So i tested it locally.

>>> cpattern = re.compile(str.encode('^(?!//)(.*)$'), 8)
>>> with open('/etc/apt/apt.conf.d/50unattended-upgrades', 'rb') as r_file:
...     r_data = mmap.mmap(r_file.fileno(), 0, access=mmap.ACCESS_READ)
...     re.search(cpattern, r_data)
... 
<_sre.SRE_Match object; span=(4313, 4313), match=b''>

since it does return something i tried to run the other part of modules.files.replace that use re.subn

>>> cpattern = re.compile(str.encode('^(?!//)(.*)$'), 8)
>>> with open('/etc/apt/apt.conf.d/50unattended-upgrades', 'rb') as r_file:
...     r_data = mmap.mmap(r_file.fileno(), 0, access=mmap.ACCESS_READ)
...     result, nreply = re.subn(cpattern,b"",r_data,0)
... 
>>> nreply
1
>>> result
b'// Unattended-Upgrade::Origins-Pattern controls which packages are\n// upgraded.\n//\n// Lines below have the format format is "keyword=value,...".  A\n// package will be upgraded only if the values in its metadata match\n// all the supplied keywords in a line.  (In other words, omitted\n// keywords are wild cards.) The keywords originate from the Release\n// file, but several aliases are accepted.  The accepted keywords are:\n//   a,archive,suite (eg, "stable")\n//   c,component     (eg, "main", "contrib", "non-free")\n//   l,label         (eg, "Debian", "Debian-Security")\n//   o,origin        (eg, "Debian", "Unofficial Multimedia Packages")\n (...)

The output is identical, but there was a match. So i printed r_data to make a diff.

>>> r_data.read()
b' Unattended-Upgrade::Origins-Pattern controls which packages are\n// upgraded.\n//\n// Lines below have the format format is "keyword=value,...".  A\n// package will be upgraded only if the values in its metadata match\n// all the supplied keywords in a line.  (In other words, omitted\n// keywords are wild cards.) The keywords originate from the Release\n// file, but several aliases are accepted.  The accepted keywords are:\n//   a,archive,suite (eg, "stable")\n//   c,component     (eg, "main", "contrib", "non-free")\n//   l,label         (eg, "Debian", "Debian-Security")\n//   o,origin        (eg, "Debian", "Unofficial Multimedia Packages")\n (...)

it look like when loading the file in memory, the first 2 // are missing, and that is what re match . My guess is that there is something wrong with the way we work with the file, but this is over my reach.

One weird thing that could counter this idea is that when i try to match something that doesn't exist, the output of re.subn still contain the first 2 //, while not present in r_data

>>> cpattern = re.compile(str.encode('^ThisWontMatch$'), 8)
>>> with open('/etc/apt/apt.conf.d/50unattended-upgrades', 'rb') as r_file:
...     r_data = mmap.mmap(r_file.fileno(), 0, access=mmap.ACCESS_READ)
...     result, nreply = re.subn(cpattern,b"",r_data,0)
... 
>>> nreply
0
>>> result
b'// Unattended-Upgrade::Origins-Pattern controls which packages are\n// upgraded.\n//\n// Lines below have the format format is "keyword=value,...".  A\n// package will be upgraded only if the values in its metadata match\n// all the supplied keywords in a line.  (In other words, omitted\n// keywords are wild cards.) The keywords originate from the Release\n// file, but several aliases are accepted.  The accepted keywords are:\n//   a,archive,suite (eg, "stable")\n//   c,component     (eg, "main", "contrib", "non-free")\n//   l,label         (eg, "Debian", "Debian-Security")\n//   o,origin        (eg, "Debian", "Unofficial Multimedia Packages")\n (...)
>>> r_data.read()
b' Unattended-Upgrade::Origins-Pattern controls which packages are\n// upgraded.\n//\n// Lines below have the format format is "keyword=value,...".  A\n (...)

Setup
simplified setup

vaarlion@salt2:~$ cat test
// toto
//tata
// tutu

Steps to Reproduce the behavior

vaarlion@salt2:~$ python3
Python 3.5.3 (default, Apr  5 2021, 09:00:41) 
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> import mmap
>>> cpattern = re.compile(str.encode('^(?!//)(.*)$'), 8)
>>> with open('test', 'rb') as r_file:
...     r_data = mmap.mmap(r_file.fileno(), 0, access=mmap.ACCESS_READ)
...     result, nreply = re.subn(cpattern,b"",r_data,0)
...     re.search(cpattern, r_data)
... 
<_sre.SRE_Match object; span=(23, 23), match=b''>
>>> nreply
1
>>> result
b'// toto\n//tata\n// tutu\n'

vaarlion@salt2:~$ sudo salt-call file.replace test '^(?!//)(.*)$' "" flags='[MULTILINE]' bufsize=file dry_run=True show_changes=False
local:
    False
vaarlion@salt2:~$ sudo salt-call file.replace test '^(//)(.*)$' "" flags='[MULTILINE]' bufsize=file dry_run=True show_changes=False
local:
    True
vaarlion@salt2:~$ sudo salt-call file.replace test '^(?!//)(.*)$' "" flags='[MULTILINE]' bufsize=file dry_run=True show_changes=False search_only=True
local:
    True
vaarlion@salt2:~$ sudo salt-call file.replace test '^(//)(.*)$' "" flags='[MULTILINE]' bufsize=file dry_run=True show_changes=False search_only=True
local:
    True

Expected behavior
modules.file.replace in search_only and modules.file.search should not return True when nothing need to be changed

Screenshots
If applicable, add screenshots to help explain your problem.

Versions Report

salt --versions-report

(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)

Salt Version:
          Salt: 3003
 
Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: 2.5.3
     docker-py: Not Installed
         gitdb: 2.0.0
     gitpython: 2.1.1
        Jinja2: 2.11.3
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: 1.0.6
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: 2.6.1
  pycryptodome: 3.10.1
        pygit2: Not Installed
        Python: 3.5.3 (default, Apr  5 2021, 09:00:41)
  python-gnupg: Not Installed
        PyYAML: 5.3.1
         PyZMQ: 20.0.0
         smmap: 2.0.1
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.3
 
System Versions:
          dist: debian 9 stretch
        locale: UTF-8
       machine: x86_64
       release: 4.9.0-15-amd64
        system: Linux
       version: Debian GNU/Linux 9 stretch

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

danielrobbins · 2021-04-20T18:52:18Z

I have reproduced your issue. I created a salt state with your exact file.comment state, but operating on my own test file. As you indicate, file.comment does work correctly and comment out all the lines when you run it the first time. This is not the problem. The problem is on successive runs when the file really needs no changes. When I run with test=true, We get the message "File /var/tmp/comment is set to be updated":

# salt xps state.apply comment-state test=true
xps:
----------
          ID: comment_file
    Function: file.comment
        Name: /var/tmp/comment
      Result: None
     Comment: File /var/tmp/comment is set to be updated
     Started: 12:45:35.193734
    Duration: 5.703 ms
     Changes:   
              ----------
              /var/tmp/comment:
                  updated

Summary for xps
------------
Succeeded: 1 (unchanged=1, changed=1)
Failed:    0
------------
Total states run:     1
Total run time:   5.703 ms

We should instead get a message "Pattern already commented".

I can also see that when the state has already been applied, running it without test=true gives us a 'bad' response, indicating a change was made, but no change was made, because the file was already good:

# salt xps state.apply comment-state
xps:
----------
          ID: comment_file
    Function: file.comment
        Name: /var/tmp/comment
      Result: True
     Comment: Commented lines successfully
     Started: 12:41:23.950437
    Duration: 6.502 ms
     Changes:   

Summary for xps
------------
Succeeded: 1
Failed:    0
------------
Total states run:     1
Total run time:   6.502 ms

I believe both these problems are related and are likely due to the following lines in salt/salt/states/file.py:comment() method not working correctly, at least for your regex:

    # Make sure the pattern appears in the file before continuing
    if not __salt__["file.search"](name, regex, multiline=True):
        if __salt__["file.search"](name, comment_regex, multiline=True):
            ret["comment"] = "Pattern already commented"
            ret["result"] = True
            return ret
        else:
            return _error(ret, "{}: Pattern not found".format(unanchor_regex))

If this code doesn't intercept the 'no changes' case, it will flow to the code underneath, which assumes there are changes to be made, and then will return values indicating that the changes were made successfully. I will investigate further.

danielrobbins · 2021-04-20T19:20:20Z

OK, in attempting to narrow down the problem, I've determined file.comment works 100% fine with a simpler regex such as "^foobar" and your comment string '//' -- running again correctly indicates no changes will be made.

So file.comment works perfectly for simpler scenarios, but not 100% perfect for your specific test case.

I think it is very likely that the state is not properly handling your more complex regex which contains a negative lookahead "(?!...)". I believe that this is the issue.

This issue can be caused by a couple of things. First, there is a pre-processing of the regex that happens inside the comment state itself (see unanchor_regex) -- this could be mangling the negative lookahead and causing it to be weird. The other possibility is that the file.search exec function you referenced is not handling the negative lookahead properly.

More investigation required to determine root cause.

danielrobbins · 2021-04-22T21:36:25Z

I think I found the issue, and it has to do with your regex.

^(?!//)(.*)$ will actually have one match, an empty string -- as long as the file exists. This is what is causing the strange behavior of your regex.

I think you want to replace * with +, which will indicate that your negative lookahead has to be on a line with at least one character on it. Try it and see. I have also written this sample program to demonstrate:

#!/usr/bin/python3

import re

def reptest(pat):
    with open('/var/tmp/comment', 'r') as testfile:
        data = testfile.read()
        cpattern=re.compile(pat, flags=re.MULTILINE)
        result, nrepl = re.subn(cpattern, "", data, count=0)
        print("pat     > ", pat)
        print("orig    > ", repr(data))
        print("result  > ", repr(result))
        print("matches > ", nrepl)
        print("match   > ", re.search(cpattern, data))
        print()

reptest('^(?!//)(.*)$')
reptest('^(?!//)(.+)$')
reptest('^foobar')

So this appears to be a quirk of how your specific regex works, which is confusing. Let me know your results with the minor regex change and if now things work as expected.

You will see that even though your regex makes no changes to the data, it actually registers a match! This is why it's so confusing. Possibly we could test against this specific case by looking to see if the replacement data is any different from the new data, rather than looking for number of matches, which is problematic, since some matches may be 'null' matches like this.

Vaarlion · 2021-04-23T05:49:50Z

Thank you for taking the time to look at this @danielrobbins !
I did try my best to rule out regex issue before posting a bug here, and i though i had something when finding that weird data thing.

I'm guessing it matched the last "empty" line of any file , but i would have expected the file.comment state i use to append // at the end of the file for every run then.

It no longer return wrong change by changing the regex, but it also won't add comment to empty line in the middle of the text block.
It isn't important in my case but might be in other. Good to know.

Once again sorry for the false bug report

danielrobbins · 2021-04-23T06:08:13Z

@Vaarlion you are very welcome! This is actually a weird behavior of the python regex implementation, which is also discussed here: rust-lang/regex#355

It may actually be a good idea to look into some fix/tweak for this so salt users don't have to deal with this strange peculiarity of the multiline regex implementation in Python. I would like to look into the possibility of making this better and getting your original regex working correctly -- because I think others may write your regex expecting the result you expected, but also be surprised by this behavior. So I am going to re-open the issue and will continue to look into it.

danielrobbins · 2021-04-26T17:47:47Z

I believe this change to the source code will resolve the issue. It works perfectly in my limited testing and causes your original regex to be treated as expected by salt:

diff --git a/salt/modules/file.py b/salt/modules/file.py
index 6ea81e70fb..e70a6849f3 100644
--- a/salt/modules/file.py
+++ b/salt/modules/file.py
@@ -2367,6 +2367,11 @@ def replace(
     no changes are made.
 
     This is a pure Python implementation that wraps Python's :py:func:`~re.sub`.
+    An assumption is made that the string you are trying to replace is not an
+    'empty' string, and your pattern should evaluate to a non-zero-length string
+    when matched. Python's regular expression engine can return zero-length
+    'matches' when using certain patterns, and we intentionally ignore these to
+    avoid false positives.
 
     path
         Filesystem path to the file to be edited. If a symlink is specified, it
@@ -2553,11 +2558,11 @@ def replace(
                 # size of file in /proc is 0, but contains data
                 r_data = salt.utils.stringutils.to_bytes("".join(r_file))
             if search_only:
-                # Just search; bail as early as a match is found
-                if re.search(cpattern, r_data):
-                    return True  # `with` block handles file closure
-                else:
-                    return False
+                # Search for first non-empty match, and if found, return True. Otherwise, False.
+                for s_match in re.finditer(cpattern, r_data):
+                    if len(s_match.group(0)):
+                        return True
+                return False
             else:
                 result, nrepl = re.subn(
                     cpattern,

danielrobbins · 2021-04-26T17:52:49Z

Please note that my fix above needs further testing. I hope it will work for everyone and not introduce any unexpected changes of behavior for existing users of this function who may want a zero-length match to show up a match. I think it's unlikely that people want replace() to do that, but you never know...

Vaarlion · 2021-04-27T07:07:55Z

thank you for looking into it danielrobbins.
Quick question, does a search for an empty line count as a zero-lenght match ? or does it have a length of 2 since it include the new line escape \n?

I can quickly think of a case where you want to fill empty line between 2 block of text for example. or remove them

Vaarlion added Bug broken, incorrect, or confusing behavior needs-triage labels Apr 20, 2021

danielrobbins self-assigned this Apr 20, 2021

danielrobbins removed the needs-triage label Apr 20, 2021

Vaarlion closed this as completed Apr 23, 2021

danielrobbins reopened this Apr 23, 2021

sagetherage unassigned danielrobbins Apr 26, 2021

sagetherage added the severity-high 2nd top severity, seen by most users, causes major problems label Apr 26, 2021

sagetherage added this to the Approved milestone Apr 26, 2021

sagetherage added severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around and removed severity-high 2nd top severity, seen by most users, causes major problems labels Apr 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] file.replace and file.search false change when looking for `//` #60055

[BUG] file.replace and file.search false change when looking for `//` #60055

Vaarlion commented Apr 20, 2021 •

edited

Loading

danielrobbins commented Apr 20, 2021 •

edited

Loading

danielrobbins commented Apr 20, 2021 •

edited

Loading

danielrobbins commented Apr 22, 2021 •

edited

Loading

Vaarlion commented Apr 23, 2021 •

edited

Loading

danielrobbins commented Apr 23, 2021

danielrobbins commented Apr 26, 2021

danielrobbins commented Apr 26, 2021

Vaarlion commented Apr 27, 2021

[BUG] file.replace and file.search false change when looking for // #60055

[BUG] file.replace and file.search false change when looking for // #60055

Comments

Vaarlion commented Apr 20, 2021 • edited Loading

danielrobbins commented Apr 20, 2021 • edited Loading

danielrobbins commented Apr 20, 2021 • edited Loading

danielrobbins commented Apr 22, 2021 • edited Loading

Vaarlion commented Apr 23, 2021 • edited Loading

danielrobbins commented Apr 23, 2021

danielrobbins commented Apr 26, 2021

danielrobbins commented Apr 26, 2021

Vaarlion commented Apr 27, 2021

[BUG] file.replace and file.search false change when looking for `//` #60055

[BUG] file.replace and file.search false change when looking for `//` #60055

Vaarlion commented Apr 20, 2021 •

edited

Loading

danielrobbins commented Apr 20, 2021 •

edited

Loading

danielrobbins commented Apr 20, 2021 •

edited

Loading

danielrobbins commented Apr 22, 2021 •

edited

Loading

Vaarlion commented Apr 23, 2021 •

edited

Loading