-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Make ExtensionDtype.construct_array_type a method #36136
API: Make ExtensionDtype.construct_array_type a method #36136
Conversation
This allows a single dtype to support multiple array classes. For arrow-backed strings, we'll likely want a separate array class for ease of implementation, clarity. But we'll have a parametrized dtype. ```python class StringDtype: def __init__(self, storage="python"): self.storage = storage def construct_array_type(self): # regular method if self.storage == "python": return StringArray else: return ArrowStringArray ``` Closes pandas-dev#36126
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully onboard! Just some concerns about backwards compatibility
.. _whatsnew_120.api_breaking.experimental: | ||
|
||
Changes to experimental APIs | ||
---------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it needs ~~~~~~~~~~~~~~~~~~~~~~~~
type header
Changes to experimental APIs | ||
---------------------------- | ||
|
||
- :meth:`pandas.api.extensions.ExtensionDtype.construct_array_type` has changed from a classmethod to a regular method to support one dtype being used for multiple arrays. To migrate, change your definition to a regular method and ensure that your method is called on instances rather than the class (:issue:`36126`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, as long as you are not calling it yourself on the class object, and only implementing it, it shouldn't directly break existing dtypes ? (a class method can also be called from an instance)
(otherwise I think we should do it in a non-breaking way, eg by taking a different name)
I just had a thought... As long as we're careful to only call the method on instances and not types, it may be OK for StringDtype.construct_array_type to be a regular method? The only place we don't do that now is in the test_decimal tests, everywhere in the library we're already working on instances, not types. It's not the most sound, but maybe it's OK to avoid any risk of API-breaking changes in the interface? |
Yes, that sounds correct to me. I would still document it that it is allowed? (it might also be useful for external dtypes) Or maybe add a based extension test to encourage downstream users to change it? |
Yeah, I think we'll want to encourage people to update. Perhaps via a (non-visisble) deprecationwarning at first, then an extension test asserting that it's not a classmethod, then we make the change in 2.x. |
I am ok with just changing this, its an experimental api after all, not worth waiting for 2.x |
Decimal tests failing |
@TomAugspurger just pushed a small commit that fixes the broken test |
Thanks. Need to think a bit more about the best way forward here. If pandas
doesn't actually use classmethods on instances, we may be able to get away
with just defining it as a regular method where we need it and not go
through a deprecation process.
…On Thu, Sep 17, 2020 at 5:14 PM jbrockmendel ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger> just pushed a small
commit that fixes the broken test
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#36136 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIS5WLOV6A6PHVVWOPLSGKC4NANCNFSM4Q2YPRUA>
.
|
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
Going to hold on this till it's definitely needed. |
Personally, I think it is still useful to make this change. As long as we only call the method on instances, it shouldn't actually be a breaking change. So although it might not be explicitly needed for the StringArray (since it can just make it a plain method only there), it might still be useful in general to change this to signal that this is OK. |
This allows a single dtype to support multiple array classes.
For arrow-backed strings, we'll likely want a separate array class
for ease of implementation, clarity. But we'll have a parametrized
dtype.
Closes #36126