Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYPING: add some type hints to core.generic #27646

Conversation

simonjayhawkins
Copy link
Member

pre-cursor to #27527

@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Jul 29, 2019
environment.yml Outdated
@@ -24,6 +24,7 @@ dependencies:
- isort # check that imports are in the right order
- mypy
- pycodestyle # used by flake8
- typing_extensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a note in 1.0 docs (may have to add a new section on dependencies like we did in 0.25) and list this

@@ -72,6 +86,19 @@
from pandas.io.formats.printing import pprint_thing
from pandas.tseries.frequencies import to_offset

if TYPE_CHECKING:
from pandas import Series, DataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put these in _typing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the errors is local to this module since I think a global errors should also include coerce?

similarly a global How should probably be left, right, inner, outer?

I could rename them if you think these have additional utility, to say IgnoreRaise and FirstLast

@@ -244,7 +271,7 @@ def _validate_dtype(self, dtype):
# Construction

@property
def _constructor(self):
def _constructor(self: FrameOrSeries) -> FrameOrSeries:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the return type by Type[FrameOrSeries]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

needed a few ignores though!

@simonjayhawkins simonjayhawkins mentioned this pull request Aug 5, 2019
4 tasks
@WillAyd
Copy link
Member

WillAyd commented Aug 5, 2019

I think typing_extensions has some good stuff (the Literal application here is great, have seen good use cases for Final in the past) but noting that we did try this before in #25975 and I think there was concern about adding another dependency

cc @TomAugspurger to see if that's still a concern

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 5, 2019 via email

@WillAyd
Copy link
Member

WillAyd commented Aug 5, 2019

It does look likes Literal is still considered an alpha feature:

https://mypy.readthedocs.io/en/latest/literal_types.html

So no strong preference on my end but maybe safer to forgo that for now and not take on the extra dep. Can make more strict as that stabilizes in the future and maybe makes its way into stdlib

@simonjayhawkins
Copy link
Member Author

For what's essentially a dev-only dependency,

it's only added to the dev environment. not to any of the other builds.

@TomAugspurger
Copy link
Contributor

A user would need it though, right? Else the name wouldn't be defined?

@simonjayhawkins
Copy link
Member Author

have seen good use cases for Final in the past

since we only do the mypy check on py3.7, we can add Final etc. anyway... and get around the runtime behaviour on py3.5 and py3.6 in the same way as getting around typing_extensions here.

the methodology used will be required under some scenarios anyway.. https://mypy.readthedocs.io/en/latest/common_issues.html#using-classes-that-are-generic-in-stubs-but-not-at-runtime

@simonjayhawkins
Copy link
Member Author

A user would need it though, right? Else the name wouldn't be defined?

only if performing type checking.

@simonjayhawkins
Copy link
Member Author

It does look likes Literal is still considered an alpha feature:

but I think should have utility in overloads to reduce the number of casts or ignores needed.

@WillAyd
Copy link
Member

WillAyd commented Aug 6, 2019

Thinking it over some more and knowing this is a dev-only dependency I don't think adding this is worth it, especially since any time we would use we would have to wrap in a TYPE_CHECKING check and provide an alias when not type checking. I think this is less readable and perhaps confusing to those less versed in annotations

I think the rest looks good without typing_extensions

@simonjayhawkins
Copy link
Member Author

Thinking it over some more and knowing this is a dev-only dependency I don't think adding this is worth it, especially since any time we would use we would have to wrap in a TYPE_CHECKING check and provide an alias when not type checking. I think this is less readable and perhaps confusing to those less versed in annotations

I'll revert for now but we will need a better reason not to include Literal IMHO.

see python/typing#478 (comment) and https://docs.google.com/document/d/1vpMse4c6DrWH5rq2tQSx3qwP_m_0lyn-Ij4WHqQqRHY/mobilebasic

@simonjayhawkins
Copy link
Member Author

From what I recall, the benefits of adding that extension were relatively minor.

from PEP 586 -- Literal Types, Motivation and Rationale https://www.python.org/dev/peps/pep-0586

This pattern is also fairly common in many popular 3rd party libraries. For example, here are just two examples from pandas and numpy respectively:

pandas.concat(...) will return either Series or DataFrame depending on whether the axis argument is set to 0 or 1.

numpy.unique will return either a single array or a tuple containing anywhere from two to four arrays depending on three boolean flag values.

@shoyer put the case for pandas. so I think we should use it.

from pandas import Series, DataFrame

bool_t = bool
FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the change to move this to _typing a precursor PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #28173


bool_t = bool
FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame")
_T = TypeVar("_T")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably OK to move to _typing as well as I think generally useful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #28173

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is _T? Why do we need an alias for bool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is _T?

unconstrained type variable used in _protect_consolidate to preserve return type from callable.

_is_numeric_mixed_type, _is_mixed_type etc. call _protect_consolidate and returns a bool

_consolidate returns a DataFrame or Series if inplace=False

if a union was used for the return type of _protect_consolidate would require casts to allow return types of _consolidate, _is_numeric_mixed_type etc. to be annotated.

Why do we need an alias for bool?

Need alias because NDFrame has def bool...

    def bool(self):
        """
        Return the bool of a single element PandasObject.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks

@@ -170,6 +183,11 @@ class NDFrame(PandasObject, SelectionMixin):
_metadata = [] # type: List[str]
_is_copy = None
_data = None # type: BlockManager
_AXIS_ALIASES = None # type: Dict[str, int]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd if I put these in a TYPE_CHECKING block, would not need to make changes to pandas/tests/indexing/test_indexing.py and maintain runtime behaviour.

what's your preference here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure I follow - what does the TYPE_CHECKING block buy us here? These are all stdlib types anyway, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we're still supporting Python 3.5 we are using a comment-based syntax for variable annotations.

_AXIS_ALIASES = None # type: Dict[str, int]

and to be valid Python syntax need to initialize to None, whereas shortly when we drop Python 3.5 we can use..

_AXIS_ALIASES: Dict[str, int]

and we can declare the variable type without any initialization necessary which will not change runtime behaviour.

see also https://stackoverflow.com/questions/56584140/python-3-5-type-annotated-variable-without-initial-value

so my preference is to add if TYPE_CHECKING: blocks for compatibility with Python 3.6 and simplify the adoption of Pyhton 3.6 variable annotations when we drop Python 3.5.

@@ -28,6 +28,7 @@
Scalar = Union[str, int, float]
Axis = Union[str, int]
Ordered = Optional[bool]
Level = Union[str, int]

# to maintain type information across generic functions and parametrization
_T = TypeVar("_T")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the discussion for this might have been elsewhere, but would it make sense for this to be something like "SameAsSelf"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is standard notation used throughout typeshed for an unconstrained typevar.

would not make sense to rename as the type variable could be used to parameterize a generic function(as in this module) that has nothing to do with the type of self.

of course, _T can also be used for generic self but doesn't need a separate alias, although we do use a separate alias (SeriesOrFrame) which is bounded, but is actually unnecessary when applied to self.

@simonjayhawkins
Copy link
Member Author

I think typing_extensions has some good stuff (the Literal application here is great, have seen good use cases for Final in the past) but noting that we did try this before in #25975 and I think there was concern about adding another dependency

cc @TomAugspurger to see if that's still a concern

typing_extensions will be availble for use with mypy 0.720 without needing to explicity add as a dependancy

https://github.com/python/mypy/blob/36aecdae92d050f626ea9029ad08f0c8382c47e7/setup.py#L181-L184

@WillAyd
Copy link
Member

WillAyd commented Sep 12, 2019 via email

@simonjayhawkins
Copy link
Member Author

Would still cause import errors if someone doesn’t have Mypy installed then though right?

would assume that would still need to use if TYPE_CHECKING, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants