Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yaml.dump(..., stream) is optional #2289

Merged
merged 7 commits into from
May 6, 2019
Merged

Conversation

pmhahn
Copy link
Contributor

@pmhahn pmhahn commented Jun 28, 2018

http://pyyaml.org/wiki/PyYAMLDocumentation#dumping-yaml

yaml.dump accepts the second optional argument, which must be an open
text or binary file. In this case, yaml.dump will write the produced
YAML document into the file. Otherwise, yaml.dump returns the produced
document.

v2: Also update the safe_dump[_all] functions and extend the description to state that stream=None makes the function return the result instead of writing it into the stream to address #2288 (comment)

@JelleZijlstra
Copy link
Member

Does that mean we could get a more precise type with an overload that says that the functions return None if the stream argument is an IO, and a str if it's None or not given?

@pmhahn
Copy link
Contributor Author

pmhahn commented Jun 28, 2018

overload is new for me and took me some to to get the order right, but now it works for me:

#!/usr/bin/python2.7
from yaml import load, dump
def main():  # type: () -> None
    doc = load("a: 1")
    with open('/dev/null', 'w') as stream:
        dump(doc, stream)
    out = dump(doc, None)
    print(out)
if __name__ == '__main__':
  main()

@pmhahn
Copy link
Contributor Author

pmhahn commented Jun 28, 2018

No, it still does not work. I'm still getting

/usr/lib/mypy/typeshed/third_party/2and3/yaml/init.pyi:27: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
/usr/lib/mypy/typeshed/third_party/2and3/yaml/init.pyi:32: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
/usr/lib/mypy/typeshed/third_party/2and3/yaml/init.pyi:37: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
/usr/lib/mypy/typeshed/third_party/2and3/yaml/init.pyi:42: error: Overloaded function signatures 1 and 2 overlap with incompatible return types

After reading python/mypy#1941 I had hoped to get it right.

@overload
def dump(data: Any, stream: None=..., Dumper=..., **kwds) -> AnyStr: ...
@overload
def dump(data: Any, stream: IO[str]=..., Dumper=..., **kwds) -> None: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need =... in the second overload here and in other functions, otherwise there is clearly an overlap, because dump(something) matches both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's it: After removing the =... it seems to work. Will --amend and push --force

def dump_all(documents: Sequence[Any], stream: IO[str]=..., Dumper=..., default_style=..., default_flow_style=..., canonical=..., indent=..., width=..., allow_unicode=..., line_break=..., encoding=..., explicit_start=..., explicit_end=..., version=..., tags=...) -> None: ...

@overload
def dump(data: Any, stream: None=..., Dumper=..., **kwds) -> AnyStr: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the return be just str here and in other functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, read http://pyyaml.org/wiki/PyYAMLDocumentation#python-3-support:

from yaml import dump
print(type(dump('', None)))  # str
print(type(dump('', None, encoding=None)))  # unicode
print(type(dump('', None, encoding='utf-8')))  # str

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, in that case Union[str, bytes] may be better. Although I know union return types are strongly discouraged. But the current way of using AnyStr as a "hackish" way to implement an unsafe union is purely based on how mypy behaves, it is not specified in PEP 484, so can break at any moment. @JelleZijlstra is this a common practice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't even know this worked like an unsafe union; I've been consistently telling people not to use AnyStr in a single position in a signature.

Regardless, can't this also be fixed with more overloads? It returns unicode if encoding is None or omitted and str when it's a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it with this:

+@overload
+def serialize_all(nodes, stream: None=..., Dumper=..., canonical=..., indent=..., width=..., allow_unicode=..., line_break=..., encoding: None=..., explicit_start=..., explicit_end=..., version=..., tags=...) -> Text: ...
+@overload
+def serialize_all(nodes, stream: None=..., Dumper=..., canonical=..., indent=..., width=..., allow_unicode=..., line_break=..., encoding: str, explicit_start=..., explicit_end=..., version=..., tags=...) -> bytes: ...
+@overload
+def serialize_all(nodes, stream: IO[str], Dumper=..., canonical=..., indent=..., width=..., allow_unicode=..., line_break=..., encoding=..., explicit_start=..., explicit_end=..., version=..., tags=...) -> None: ...

But mypy complain about this:

third_party/2and3/yaml/__init__.pyi:27: error: non-default argument follows default argument

For reference:

a = dump('')  # type: str
b = dump('', None)  # type: str
c = dump('', None, encoding=None)  # type: Text # 2:unicode 3:str
d = dump('', None, encoding='utf-8')  # type: bytes # 2:str 3:bytes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just add # type: ignore if reveal_type() shows all correct types all call sites.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilevkivskyi Your comment confuses me; the #type annotation in my last comment is only to document the type returned by dump(); that code snippet makes it easy to validate the yaml/__init__.pyi file by running mypy --custom-typeshed-dir=. snipped.py with and without --py2 and getting an error message.

So now what:

  • get the @overload working to return the correct type depending on encoding?
  • replace the AnyStr by what?
  • split the 2and3 into two versions to get rid of Text being unicode with Python2 and str with Python3?
  • something other?

Copy link
Member

@ilevkivskyi ilevkivskyi Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add # type: ignore to silence the error you see in stubs (non-default argument follows default argument). And then check if this leads to correct results in user code by using reveal_type(), this is a mypy trick reveal_type(<expression>) creates an error that shows the type of <expression> from mypy's point of view.

@JelleZijlstra
Copy link
Member

@pmhahn no need to force-push and amend; it actually makes review harder sometimes. It's fine to just push new commits to this PR; we'll clean up the commit message before merging.

@overload
def safe_dump_all(documents: Sequence[Any], stream: None, **kwds) -> AnyStr: ...
@overload
def safe_dump_all(documents: Sequence[Any], stream: IO[str]=..., **kwds) -> None: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The =... should go to the other overload; it returns a string, not None, if you omit the stream argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by a78cca2

third_party/2and3/yaml/__init__.pyi Outdated Show resolved Hide resolved
@srittau
Copy link
Collaborator

srittau commented Oct 21, 2018

@pmhahn Are you still interested in this pull request?

@pmhahn
Copy link
Contributor Author

pmhahn commented Oct 25, 2018

I had to rebase the series because of an conflict with 006a792
I verified both python2.7 and python3.5 to show now errors.
I hope all issues are now resolved.

@overload
def serialize_all(nodes, stream: None=..., Dumper=..., canonical=..., indent=..., width=..., allow_unicode=..., line_break=..., encoding: None=..., explicit_start=..., explicit_end=..., version=..., tags=...) -> str: ...
@overload
def serialize_all(nodes, stream: None=..., Dumper=..., canonical=..., indent=..., width=..., allow_unicode=..., line_break=..., encoding: str=..., explicit_start=..., explicit_end=..., version=..., tags=...) -> bytes: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy complains about this because both overloads have = ... for the encoding argument. One of the two overloads should have no default for encoding.

Copy link
Contributor Author

@pmhahn pmhahn Oct 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until python/mypy#5621 is fixed I see no practical way to resolve this: encoding is in the middle of 11 optional parameters and using that workaround would require 2^6 overload variants of those 6 functions.
I'm out of ideas for now.

@JelleZijlstra
Copy link
Member

Travis shows a few issues still; could you fix those? (See also the comment I just added.)

@srittau
Copy link
Collaborator

srittau commented Dec 2, 2018

CI fails due to a missing import and linter issues. Also, there is a conflict now.

as such move the `=...` to the None-case.
Depending on if a "stream" or the "encoding" is given, the functions
either return None, str/unicode or bytes.
Use @overload fix distinguish those cases.

Also fix the functions using **kwds as they delegate their work to the
more generic functions: copy their signatures.
As far as I know it's currently impossible to use "overloads with return
types depending on optional arguments" (Issue python#5621). As "encoding" is
in the middle of 11 optional arguments, it would require ~ 2^6 overloads
for each of those 6 functions.

For now just return Any and wait for Issue python#5621 to get fixed.
@pmhahn
Copy link
Contributor Author

pmhahn commented Dec 4, 2018

I added the missing imports.
I rebased my branch on master and force-pushed it. (or do you prefer I do a merge?)

@pmhahn
Copy link
Contributor Author

pmhahn commented Dec 5, 2018

CI fails due to a missing import and linter issues. Also, there is a conflict now.

The flake8 issues should now be fixed with e5c943c - what a pain to get the new flake8 version in Debian-Stretch as it only has Python3.5 and virtualenv is currently broken in Debian-Sid

@JelleZijlstra JelleZijlstra self-requested a review May 5, 2019 19:17
@JelleZijlstra JelleZijlstra dismissed their stale review May 5, 2019 19:17

outdated

@JelleZijlstra JelleZijlstra merged commit 548dbbb into python:master May 6, 2019
@pmhahn pmhahn deleted the yaml branch May 7, 2019 05:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants