Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape characters lost after concatenating string values #231

Closed
aGitForEveryone opened this issue Sep 7, 2022 · 8 comments · Fixed by #235
Closed

Escape characters lost after concatenating string values #231

aGitForEveryone opened this issue Sep 7, 2022 · 8 comments · Fixed by #235
Labels
bug Something isn't working

Comments

@aGitForEveryone
Copy link

When working with AWS samconfig.toml, you have to specify parameter overrides as literal strings:

[default.deploy.parameters]
parameter_overrides = "<parameter_name>=\"<parameter_value>\""

When loading this .toml with tomlkit, the backslashes are lost in translation. So a roundtrip: tomlkit.dump(tomlkit.load(config)) results in:

[default.deploy.parameters]
parameter_overrides = "<parameter_name>="<parameter_value>""

And this breaks the parameter_overrides string, since the intermediate " now break apart the string.

How is it possible to load strings as literal strings using tomlkit?

@frostming
Copy link
Contributor

import tomlkit
content = """\
[default.deploy.parameters]
parameter_overrides = "<parameter_name>=\\"<parameter_value>\\""
"""
print(tomlkit.dumps(tomlkit.loads(content)))

Results in:

[default.deploy.parameters]
parameter_overrides = "<parameter_name>=\"<parameter_value>\""

Can you check it again?

@frostming frostming added the invalid This doesn't seem right label Sep 8, 2022
@aGitForEveryone
Copy link
Author

aGitForEveryone commented Sep 9, 2022

After trying out your suggestion, I found some strange behavior:

  1. Indeed, if you start with a toml-string, and have double slashes like you suggest, the round trip results in a toml string that has single slashes. So that seems like it is working as intended.
  2. If now you replace the double slashes with a single slash:
    import tomlkit
    content = """\
    [default.deploy.parameters]
    parameter_overrides = "<parameter_name>=\"<parameter_value>\""
    """
    print(tomlkit.dumps(tomlkit.loads(content)))
    the following error is shown tomlkit.exceptions.UnexpectedCharError: Unexpected character: '<' at line 2 col 41. So, it seems that then the \" is replaces by " during parsing and the string is inappropriately broken up.
  3. On the other hand, if you load the toml-string from file (using tomlkit.load and tomlkit.dump), the behavior is reversed. If you have single slashes in the file (like in my first post) the file loads without error, however if you have double slashes like you suggest in the file, then the parsing will throw the same error as above.
  4. Looking again at the original issue after gaining above insights, I tried having 3 slashes in the toml-file that I load, i.e.:
    # config.toml
    [default.deploy.parameters]
    parameter_overrides = "<parameter_name>=\\\"<parameter_value>\\\""
    Let's investigate what happens when I load and dump that toml config and what happens when I alter the string in the mean time.
    a. Simple roundtrip: nothing is changed and we just print the state of config at various points.
     import tomlkit
     from pprint import pprint as pp
    
    
     with open('config.toml', 'rt', encoding='utf-8') as f:
         config= tomlkit.load(f)
         print('Config that was loaded')
         pp(config)
         print()
    
     print('Dumps version of config')
     print(tomlkit.dumps(config))
     print()
    
     with open('test.toml', 'wt', encoding='utf-8') as f:
         tomlkit.dump(config, f)
     with open('test.toml', 'r') as f:
         print('\nSaved config')
         print(f.read())
    Output:
    Config that was loaded
    {'default': {'deploy': {'parameters': {'parameter_overrides': '<parameter_name>=\\"<parameter_value>\\"'}}}}
    
    Dumps version of config
    [default.deploy.parameters]
    parameter_overrides = "<parameter_name>=\\\"<parameter_value>\\\""
    
    
    Saved config
    [default.deploy.parameters]
    parameter_overrides = "<parameter_name>=\\\"<parameter_value>\\\""
    
    Conclusion: doing a simple roundtrip keeps the slashes as is and gives a final toml that has three slashes, creating an invalid config for AWS.
    b. Round trip while updating the string: in this case we update the string by adding new parameters to it:
     import tomlkit
     from pprint import pprint as pp
    
    
     with open('config.toml', 'rt', encoding='utf-8') as f:
         config= tomlkit.load(f)
         print('Config that was loaded')
         pp(config)
         print()
    
     print('Dumps version of config before adjustment')
     print(tomlkit.dumps(config))
     print()
    
     config['default']['deploy']['parameters']['parameter_overrides'] \
         += f' <parameter_name_2>=\\"<parameter_value_2>\\"'
    
     print('Dumps version of config after adjustment')
     print(tomlkit.dumps(config))
     print()
    
     with open('test.toml', 'wt', encoding='utf-8') as f:
         tomlkit.dump(config, f)
     with open('test.toml', 'r') as f:
         print('\nSaved config')
         print(f.read())
    Output:
    Config that was loaded
    {'default': {'deploy': {'parameters': {'parameter_overrides': '<parameter_name>=\\"<parameter_value>\\"'}}}}
    
    Dumps version of config before adjustment
    [default.deploy.parameters]
    parameter_overrides = "<parameter_name>=\\\"<parameter_value>\\\""
    
    Dumps version of config after adjustment
    [default.deploy.parameters]
    parameter_overrides = "<parameter_name>=\"<parameter_value>\" <parameter_name_2>=\"<parameter_value_2>\""
    
    Saved config
    [default.deploy.parameters]
    parameter_overrides = "<parameter_name>=\"<parameter_value>\" <parameter_name_2>=\"<parameter_value_2>\""
    
    Conclusion: Somehow, when I updated the string, the slashes were parsed differently and as a result I end up with single slashes in my final toml-file, which is a correct config for AWS.

Conclusion - final

It is in my opinion quite confusing that strings behave in so many different ways depending on how you load them and whether or not the strings are altered on between loading and dumping the toml. Perhaps I am not knowledgeable enough on Python strings, and I am missing some Python behavior here.

Is it possible to unify the behavior of how backslashes are parsed in strings?

@frostming
Copy link
Contributor

frostming commented Sep 9, 2022

In a python string literal, which appears between a pair of quotes in a python source code, backslashes are escape characters, '\"' is equal to a single '"'

>>> '\"'
'"'

so the second attempt is an invalid toml document.
While in a file, all characters are read as-is, backslashes are real backslash characters, \ in a file will be read as "\\" in python string literal.

In a toml document, if quotes appear in between a pair of quotes, they must be escaped by a \ character, and that must be encoded to a file as-is. So the single slash should be there in the file while loaded as "\\" as python string literal.

If you are still confused after my explanation that is okay, I might not be good at it. Just read more Python manuals and try and you will understand it after you are more experienced. I am just going to close this issue now.

@frostming frostming closed this as not planned Won't fix, can't repro, duplicate, stale Sep 9, 2022
@aGitForEveryone
Copy link
Author

aGitForEveryone commented Sep 9, 2022

Ok, so it seems great that the characters are read as-is when reading toml from file, however, there is still an issue. In step 4 above I tried loading and dumping to file with 3 backslashes, I repeated that test again while have 1 backslash in the file, and I see that if I do a simple round trip, indeed the resulting file will have a single slash. However if I alter the string in the meantime, the backslash are parsed differently. Tryout this:

# config.toml
[default.deploy.parameters]
parameter_overrides = "<parameter_name>=\"<parameter_value>\""

With this test code:

import tomlkit
from pprint import pprint as pp

with open('config.toml', 'rt', encoding='utf-8') as f:
    config = tomlkit.load(f)
    print('Config that was loaded')
    pp(config)
    print()

print('Dumps version of config before adjustment')
print(tomlkit.dumps(config))
print()

config['default']['deploy']['parameters']['parameter_overrides'] \
    += f' <parameter_name_2>=\\"<parameter_value_2>\\"'

print('Dumps version of config after adjustment')
print(tomlkit.dumps(config))
print()

with open('test.toml', 'wt', encoding='utf-8') as f:
    tomlkit.dump(config, f)

with open('test.toml', 'r') as f:
    print('\nSaved config')
    print(f.read())

you will see this output:

Config that was loaded
{'default': {'deploy': {'parameters': {'parameter_overrides': '<parameter_name>="<parameter_value>"'}}}}

Dumps version of config before adjustment
# config.toml
[default.deploy.parameters]
parameter_overrides = "<parameter_name>=\"<parameter_value>\""

Dumps version of config after adjustment
# config.toml
[default.deploy.parameters]
parameter_overrides = "<parameter_name>="<parameter_value>" <parameter_name_2>=\"<parameter_value_2>\""


Saved config
# config.toml
[default.deploy.parameters]
parameter_overrides = "<parameter_name>="<parameter_value>" <parameter_name_2>=\"<parameter_value_2>\""

and the resulting test.toml is this:

# test.toml
[default.deploy.parameters]
parameter_overrides = "<parameter_name>="<parameter_value>" <parameter_name_2>=\"<parameter_value_2>\""

The double quotes around parameter_value_2 are correctly escaped, but the double quotes around the original parameter_value are lost, creating a broken string, as is seen in test.toml. Maybe the correct conclusion here is then that something goes wrong in the logic of creating a new toml string (which I guess is what happens when the second parameter is added to the string).

@frostming
Copy link
Contributor

config['default']['deploy']['parameters']['parameter_overrides'] \
    += f' <parameter_name_2>=\\"<parameter_value_2>\\"'

This is wrong, the double backslashes are only needed in a TOML document, BEFORE parsing. After parsed, the string will become <parameter_name_2>="<parameter_value_2>"
So you don't need the double backslash when updating that value

import tomlkit
content = """\
[default.deploy.parameters]
parameter_overrides = "<parameter_name>=\\"<parameter_value>\\""
"""
doc = tomlkit.parse(content)
doc['default']['deploy']['parameters']['parameter_overrides']
# Output: '<parameter_name>="<parameter_value>"'

@frostming
Copy link
Contributor

Oh, I got, when the string is updated, the back slashes are gone in dumps result

@frostming frostming reopened this Sep 9, 2022
@frostming frostming changed the title tomlkit.load: have strings load as literal strings Escape charaters lost after concatenating string values Sep 9, 2022
@frostming frostming added bug Something isn't working and removed invalid This doesn't seem right labels Sep 9, 2022
@aGitForEveryone
Copy link
Author

Yes indeed, and then in the resulting .toml file that is written by tomlkit.dump the slashes are lost.

@frostming frostming changed the title Escape charaters lost after concatenating string values Escape characters lost after concatenating string values Sep 9, 2022
@aGitForEveryone
Copy link
Author

aGitForEveryone commented Sep 9, 2022

The way I initially solved it was by doing this statement:

config['default']['deploy']['parameters']['parameter_overrides'] \
    = config['default']['deploy']['parameters']['parameter_overrides'].replace('"', '\"') \
      + f' <parameter_name_2>=\"<parameter_value_2>\"'

and then I indeed get a correct toml file again:

# test.toml
[default.deploy.parameters]
parameter_overrides = "<parameter_name>=\"<parameter_value>\" <parameter_name_2>=\"<parameter_value_2>\""

Which seems again weird to me, because now I use single backslashes and it also works. However, the replace statement is very clunky and also, because it now works with single backslashes as well, was confusing to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants