-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: should dtype=str return array of dtype StringDtype for pandas 2.0? #49398
Comments
I think this needs a more thorough investigation. How would the behavior of follow up operations change? Would you also change the behavior of I/O operations? I don't think that we can do this without a deprecation cycle |
I support |
Thanks for the reply. Yes, I hadn't considered IO, that makes it more challenging than I had though when I wrote up the issue... I could support a deprecation cycle, though perhaps if it last the entire pandas 2.x cycle, maybe better to deprecate later in the cycle, e.g. pandas 2.3 or similar IMO. Unless there is a wish do something now, I'll let this lay and I (or someone else) can pick this up at later, after pandas 2.0 has been released. |
We want to release 3.0 significantly faster than 2.0, so would be ok to introduce in 2.0 I think. But we want to finish enforcing deprecations first |
Closing as superseded #52429, where the discussion is more current. |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
IMO it would be an API improvement for pandas if creating dataframes/series/arrays using
dtype=str
(anddtype="str"
) would return a dataframe/series/array of dtypeStringDtype
instead of dtypeobject
. The reason being that IMO in 99,9 % of cases where users instantiate usingdtype=str
they would have prefer having useddtype="string"
and therefore have the guarantee that the array actually only contains strings (and NA's).This would be similar to when instantiating currently using
dtype=int
gives a dtypenp.int64
and fordtype=float
we getnp.float64
.The above proposal would be backwards incompatible and too late to introduce depreciations in pandas 1.x now. However, could it become a breaking change as part of the jump to version 2.0 of pandas, similar to the backwards-incompatible changes already listed in #44823?
Feature Description
Basically it would just change the dtype resolution function to return a
StringDtype
instead the current behavior, so reasonably simple to implement.Alternative Solutions
The alternative would be to keep the current behavior in pandas 2.0.
Additional Context
No response
The text was updated successfully, but these errors were encountered: