Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-quoted UTF-8 isn't parsed correctly #147

Closed
bbc2 opened this issue Oct 26, 2018 · 0 comments · Fixed by #148
Closed

Non-quoted UTF-8 isn't parsed correctly #147

bbc2 opened this issue Oct 26, 2018 · 0 comments · Fixed by #148

Comments

@bbc2
Copy link
Collaborator

bbc2 commented Oct 26, 2018

The .env:

UTF8=été
UTF8_QUOTED="été"

Python code:

import os
import dotenv

dotenv.load_dotenv()

print(list(os.environ.get('UTF8')))
print(list(os.environ.get('UTF8_QUOTED')))

Output:

> python foo.py 
['\\', 'x', 'e', '9', 't', '\\', 'x', 'e', '9']
['é', 't', 'é']

Expected output:

> python foo.py 
['é', 't', 'é']
['é', 't', 'é']
bbc2 added a commit to bbc2/python-dotenv that referenced this issue Oct 28, 2018
This adds support for:

* multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89
* backslashes in values, fixes theskumar#112
* trailing comments, fixes theskumar#141
* UTF-8 in unquoted values, fixes theskumar#147

Parsing is no longer line-based.  That's why `parse_line` was replaced
by `parse_binding`.  Thanks to the previous commit, users of
`parse_stream` don't have to deal with this change.

This supersedes a previous pull-request, theskumar#142, which would add support for
multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv
set`).

The key-value binding regular expression was inspired by
https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30

Parsing of escapes was fixed thanks to
https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338
bbc2 added a commit to bbc2/python-dotenv that referenced this issue Oct 28, 2018
This adds support for:

* multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89
* backslashes in values, fixes theskumar#112
* trailing comments, fixes theskumar#141
* UTF-8 in unquoted values, fixes theskumar#147

Parsing is no longer line-based.  That's why `parse_line` was replaced
by `parse_binding`.  Thanks to the previous commit, users of
`parse_stream` don't have to deal with this change.

This supersedes a previous pull-request, theskumar#142, which would add support for
multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv
set`).

The key-value binding regular expression was inspired by
https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30

Parsing of escapes was fixed thanks to
https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338
bbc2 added a commit to bbc2/python-dotenv that referenced this issue Oct 28, 2018
This adds support for:

* multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89
* backslashes in values, fixes theskumar#112
* trailing comments, fixes theskumar#141
* UTF-8 in unquoted values, fixes theskumar#147

Parsing is no longer line-based.  That's why `parse_line` was replaced
by `parse_binding`.  Thanks to the previous commit, users of
`parse_stream` don't have to deal with this change.

This supersedes a previous pull-request, theskumar#142, which would add support for
multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv
set`).

The key-value binding regular expression was inspired by
https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30

Parsing of escapes was fixed thanks to
https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338
bbc2 added a commit to bbc2/python-dotenv that referenced this issue Oct 28, 2018
This adds support for:

* multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89
* backslashes in values, fixes theskumar#112
* trailing comments, fixes theskumar#141
* UTF-8 in unquoted values, fixes theskumar#147

Parsing is no longer line-based.  That's why `parse_line` was replaced
by `parse_binding`.  Thanks to the previous commit, users of
`parse_stream` don't have to deal with this change.

This supersedes a previous pull-request, theskumar#142, which would add support for
multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv
set`).

The key-value binding regular expression was inspired by
https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30

Parsing of escapes was fixed thanks to
https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338
bbc2 added a commit to bbc2/python-dotenv that referenced this issue Oct 28, 2018
This adds support for:

* multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89
* backslashes in values, fixes theskumar#112
* trailing comments, fixes theskumar#141
* UTF-8 in unquoted values, fixes theskumar#147

Parsing is no longer line-based.  That's why `parse_line` was replaced
by `parse_binding`.  Thanks to the previous commit, users of
`parse_stream` don't have to deal with this change.

This supersedes a previous pull-request, theskumar#142, which would add support for
multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv
set`).

The key-value binding regular expression was inspired by
https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30

Parsing of escapes was fixed thanks to
https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338
bbc2 added a commit to bbc2/python-dotenv that referenced this issue Oct 28, 2018
This adds support for:

* multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89
* backslashes in values, fixes theskumar#112
* trailing comments, fixes theskumar#141
* UTF-8 in unquoted values, fixes theskumar#147

Parsing is no longer line-based.  That's why `parse_line` was replaced
by `parse_binding`.  Thanks to the previous commit, users of
`parse_stream` don't have to deal with this change.

This supersedes a previous pull-request, theskumar#142, which would add support for
multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv
set`).

The key-value binding regular expression was inspired by
https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30

Parsing of escapes was fixed thanks to
https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338
bbc2 added a commit to bbc2/python-dotenv that referenced this issue Oct 28, 2018
This adds support for:

* multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89
* backslashes in values, fixes theskumar#112
* trailing comments, fixes theskumar#141
* UTF-8 in unquoted values, fixes theskumar#147

Parsing is no longer line-based.  That's why `parse_line` was replaced
by `parse_binding`.  Thanks to the previous commit, users of
`parse_stream` don't have to deal with this change.

This supersedes a previous pull-request, theskumar#142, which would add support for
multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv
set`).

The key-value binding regular expression was inspired by
https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30

Parsing of escapes was fixed thanks to
https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338
bbc2 added a commit to bbc2/python-dotenv that referenced this issue Oct 31, 2018
This adds support for:

* multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89
* backslashes in values, fixes theskumar#112
* trailing comments, fixes theskumar#141
* UTF-8 in unquoted values, fixes theskumar#147

Parsing is no longer line-based.  That's why `parse_line` was replaced
by `parse_binding`.  Thanks to the previous commit, users of
`parse_stream` don't have to deal with this change.

This supersedes a previous pull-request, theskumar#142, which would add support for
multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv
set`).

The key-value binding regular expression was inspired by
https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30

Parsing of escapes was fixed thanks to
https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338
bbc2 added a commit to bbc2/python-dotenv that referenced this issue Nov 14, 2018
This adds support for:

* multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89
* backslashes in values, fixes theskumar#112
* trailing comments, fixes theskumar#141
* UTF-8 in unquoted values, fixes theskumar#147

Parsing is no longer line-based.  That's why `parse_line` was replaced
by `parse_binding`.  Thanks to the previous commit, users of
`parse_stream` don't have to deal with this change.

This supersedes a previous pull-request, theskumar#142, which would add support for
multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv
set`).

The key-value binding regular expression was inspired by
https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30

Parsing of escapes was fixed thanks to
https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338
theskumar pushed a commit that referenced this issue Dec 5, 2018
… UTF-8 (#148)

* Fix deprecation warning for POSIX variable regex

This was also caught by Flake8 as:

    ./dotenv/main.py:19:2: W605 invalid escape sequence '\$'
    ./dotenv/main.py:19:4: W605 invalid escape sequence '\{'
    ./dotenv/main.py:19:8: W605 invalid escape sequence '\}'
    ./dotenv/main.py:19:12: W605 invalid escape sequence '\}'

* Turn get_stream into a context manager

This avoids the use of the `is_file` class variable by abstracting away
the difference between `StringIO` and a file stream.

* Deduplicate parsing code and abstract away lines

Parsing .env files is a critical part of this package.  To make it
easier to change it and test it, it is important that it is done in only
one place.

Also, code that uses the parser now doesn't depend on the fact that each
key-value binding spans exactly one line.  This will make it easier to
handle multiline bindings in the future.

* Parse newline, UTF-8, trailing comment, backslash

This adds support for:

* multiline values (i.e. containing newlines or escaped \n), fixes #89
* backslashes in values, fixes #112
* trailing comments, fixes #141
* UTF-8 in unquoted values, fixes #147

Parsing is no longer line-based.  That's why `parse_line` was replaced
by `parse_binding`.  Thanks to the previous commit, users of
`parse_stream` don't have to deal with this change.

This supersedes a previous pull-request, #142, which would add support for
multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv
set`).

The key-value binding regular expression was inspired by
https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30

Parsing of escapes was fixed thanks to
https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338
johnbergvall pushed a commit to johnbergvall/python-dotenv that referenced this issue Aug 13, 2021
… UTF-8 (theskumar#148)

* Fix deprecation warning for POSIX variable regex

This was also caught by Flake8 as:

    ./dotenv/main.py:19:2: W605 invalid escape sequence '\$'
    ./dotenv/main.py:19:4: W605 invalid escape sequence '\{'
    ./dotenv/main.py:19:8: W605 invalid escape sequence '\}'
    ./dotenv/main.py:19:12: W605 invalid escape sequence '\}'

* Turn get_stream into a context manager

This avoids the use of the `is_file` class variable by abstracting away
the difference between `StringIO` and a file stream.

* Deduplicate parsing code and abstract away lines

Parsing .env files is a critical part of this package.  To make it
easier to change it and test it, it is important that it is done in only
one place.

Also, code that uses the parser now doesn't depend on the fact that each
key-value binding spans exactly one line.  This will make it easier to
handle multiline bindings in the future.

* Parse newline, UTF-8, trailing comment, backslash

This adds support for:

* multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89
* backslashes in values, fixes theskumar#112
* trailing comments, fixes theskumar#141
* UTF-8 in unquoted values, fixes theskumar#147

Parsing is no longer line-based.  That's why `parse_line` was replaced
by `parse_binding`.  Thanks to the previous commit, users of
`parse_stream` don't have to deal with this change.

This supersedes a previous pull-request, theskumar#142, which would add support for
multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv
set`).

The key-value binding regular expression was inspired by
https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30

Parsing of escapes was fixed thanks to
https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant