Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: missing __getitem__ type fixes #1963

Merged
merged 6 commits into from
Feb 8, 2025

Conversation

EdAbati
Copy link
Collaborator

@EdAbati EdAbati commented Feb 7, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

I am so sorry @MarcoGorelli , I was a bit to quick to press "Ready for Review" 😭

This is a follow up of #1958

) -> Series[Any]: ...

@overload
def __getitem__(
self: Self,
key: (
item: (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

silly mistake (I blame copilot) πŸ™ˆ

I wonder why it was not picked up though

@dangotbanned

This comment was marked as resolved.

@MarcoGorelli
Copy link
Member

I haven't got permissions

you do now πŸ˜‰ feel free to make improvements, thanks for all your help + evangelism of Narwhals

@dangotbanned
Copy link
Member

I haven't got permissions

you do now πŸ˜‰ feel free to make improvements, thanks for all your help + evangelism of Narwhals

Oh hello 😊
Thanks @MarcoGorelli!

@dangotbanned
Copy link
Member

dangotbanned commented Feb 8, 2025

Maybe this is too much for the PR.
But from someone who doesn't use __getitem__ - replacing these types with the polars aliases would be much easier to read IMO.

# Annotations for `__getitem__` methods
SingleIndexSelector: TypeAlias = int
MultiIndexSelector: TypeAlias = Union[
    slice,
    range,
    Sequence[int],
    "Series",
    "np.ndarray[Any, Any]",
]
SingleNameSelector: TypeAlias = str
MultiNameSelector: TypeAlias = Union[
    slice,
    Sequence[str],
    "Series",
    "np.ndarray[Any, Any]",
]
BooleanMask: TypeAlias = Union[
    Sequence[bool],
    "Series",
    "np.ndarray[Any, Any]",
]
SingleColSelector: TypeAlias = Union[SingleIndexSelector, SingleNameSelector]
MultiColSelector: TypeAlias = Union[MultiIndexSelector, MultiNameSelector, BooleanMask]

Still ends up pretty complex in the @overload(s), but much easier to differentiate the parts:

    # `str` overlaps with `Sequence[str]`
    # We can ignore this but we must keep this overload ordering
    @overload
    def __getitem__(
        self, key: tuple[SingleIndexSelector, SingleColSelector]
    ) -> Any: ...

    @overload
    def __getitem__(  # type: ignore[overload-overlap]
        self, key: str | tuple[MultiIndexSelector, SingleColSelector]
    ) -> Series: ...

    @overload
    def __getitem__(
        self,
        key: (
            SingleIndexSelector
            | MultiIndexSelector
            | MultiColSelector
            | tuple[SingleIndexSelector, MultiColSelector]
            | tuple[MultiIndexSelector, MultiColSelector]
        ),
    ) -> DataFrame: ...

    def __getitem__(
        self,
        key: (
            SingleIndexSelector
            | SingleColSelector
            | MultiColSelector
            | MultiIndexSelector
            | tuple[SingleIndexSelector, SingleColSelector]
            | tuple[SingleIndexSelector, MultiColSelector]
            | tuple[MultiIndexSelector, SingleColSelector]
            | tuple[MultiIndexSelector, MultiColSelector]
        ),
    ) -> DataFrame | Series | Any:

@MarcoGorelli do you see these aliases as a desirable thing to steal (re-implement) from polars?

key: (
slice
item: (
int
Copy link
Collaborator Author

@EdAbati EdAbati Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dangotbanned thanks for fixing! super quick I didn't have the chance to see the comments :D

Anyway I think that int is a bit of a special case.
The narwhals rule at the moment is that int in __getitem__ should not be fully supported (https://narwhals-dev.github.io/narwhals/pandas_like_concepts/column_names/) or encouraged .

I think we shouldn't add it in typing here, but ignore the int slicing in the tests if a typing error occurs.

Having said that, it is weird that after my changes mypy wants me to remove the # ignore in the tests

Copy link
Member

@dangotbanned dangotbanned Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EdAbati interesting πŸ€”

So should this line in the doc be rewritten?

- Integers are always interpreted as positions

I do appreciate that what you're saying is documented, but writing always reads to me as something that belongs in the annotation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm oh yeah actually good catch, you are right! then it makes a lot of sense

Copy link
Member

@dangotbanned dangotbanned Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EdAbati marking as unresolved just to show another option if a reviewer wanted it:

diff --git a/narwhals/dataframe.py b/narwhals/dataframe.py
index 6b673a80..207ef7b2 100644
--- a/narwhals/dataframe.py
+++ b/narwhals/dataframe.py
@@ -859,6 +859,8 @@ class DataFrame(BaseFrame[DataFrameT]):
         """
         return self._compliant_frame.estimated_size(unit=unit)  # type: ignore[no-any-return]
 
+    @overload
+    def __getitem__(self: DataFrame[pd.DataFrame], item: int) -> Any: ...
     @overload
     def __getitem__(  # type: ignore[overload-overlap]
         self: Self,
diff --git a/narwhals/stable/v1/__init__.py b/narwhals/stable/v1/__init__.py
index 5aefefe2..af8aed47 100644
--- a/narwhals/stable/v1/__init__.py
+++ b/narwhals/stable/v1/__init__.py
@@ -84,6 +84,7 @@ if TYPE_CHECKING:
     from types import ModuleType
 
     import numpy as np
+    import pandas as pd
     from typing_extensions import Self
 
     from narwhals.dtypes import DType
@@ -131,6 +132,8 @@ class DataFrame(NwDataFrame[IntoDataFrameT]):
     def _lazyframe(self: Self) -> type[LazyFrame[Any]]:
         return LazyFrame
 
+    @overload
+    def __getitem__(self: DataFrame[pd.DataFrame], item: int) -> Any: ...
     @overload
     def __getitem__(  # type: ignore[overload-overlap]
         self: Self,

I think this describes the pandas edge case?

image

@EdAbati
Copy link
Collaborator Author

EdAbati commented Feb 8, 2025

Still ends up pretty complex in the @overload(s), but much easier to differentiate the parts:

I agree, my plan was actually to add these in this or follow-up PR. I find them easier to read and follow

@EdAbati EdAbati marked this pull request as ready for review February 8, 2025 13:25
assert_equal_data(result, {"a": [2], "b": [12]})


def test_slice_fails(constructor_eager: ConstructorEager) -> None:
class Foo: ...

with pytest.raises(TypeError, match="Expected str or slice, got:"):
nw.from_native(constructor_eager(data), eager_only=True)[Foo()] # type: ignore[call-overload]
nw.from_native(constructor_eager(data), eager_only=True)[Foo()] # type: ignore[call-overload, unused-ignore]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ€” ignoring unused-ignore seems a bit strange?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli see #1963 (comment)

It is to ignore pre-commit not understanding the call is invalid

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks both!

@MarcoGorelli MarcoGorelli merged commit 02bcc1d into narwhals-dev:main Feb 8, 2025
24 checks passed
@EdAbati EdAbati deleted the more-fixes-getitem branch February 8, 2025 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants