Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: should dtype=str return array of dtype StringDtype for pandas 2.0? #49398

Closed
1 of 3 tasks
topper-123 opened this issue Oct 30, 2022 · 5 comments
Closed
1 of 3 tasks
Labels
API Design Closing Candidate May be closeable, needs more eyeballs Strings String extension data type and string data

Comments

@topper-123
Copy link
Contributor

topper-123 commented Oct 30, 2022

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

IMO it would be an API improvement for pandas if creating dataframes/series/arrays using dtype=str (and dtype="str") would return a dataframe/series/array of dtype StringDtype instead of dtype object. The reason being that IMO in 99,9 % of cases where users instantiate using dtype=str they would have prefer having used dtype="string" and therefore have the guarantee that the array actually only contains strings (and NA's).

This would be similar to when instantiating currently using dtype=int gives a dtype np.int64 and for dtype=float we get np.float64.

The above proposal would be backwards incompatible and too late to introduce depreciations in pandas 1.x now. However, could it become a breaking change as part of the jump to version 2.0 of pandas, similar to the backwards-incompatible changes already listed in #44823?

Feature Description

Basically it would just change the dtype resolution function to return a StringDtype instead the current behavior, so reasonably simple to implement.

Alternative Solutions

The alternative would be to keep the current behavior in pandas 2.0.

Additional Context

No response

@topper-123 topper-123 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 30, 2022
@topper-123 topper-123 changed the title API: dtype=str should return array of dtype StringDtype for pandas 2.0 API: should dtype=str return array of dtype StringDtype for pandas 2.0? Oct 30, 2022
@topper-123 topper-123 added API Design Strings String extension data type and string data and removed Enhancement labels Oct 30, 2022
@phofl
Copy link
Member

phofl commented Oct 31, 2022

I think this needs a more thorough investigation.

How would the behavior of follow up operations change?

Would you also change the behavior of I/O operations? I don't think that we can do this without a deprecation cycle

@mroeschke
Copy link
Member

I support dtype=str eventually mapping to StringDtype, but personally I think it would be better through a deprecation than a 2.0 breaking change.

@topper-123
Copy link
Contributor Author

Thanks for the reply. Yes, I hadn't considered IO, that makes it more challenging than I had though when I wrote up the issue...

I could support a deprecation cycle, though perhaps if it last the entire pandas 2.x cycle, maybe better to deprecate later in the cycle, e.g. pandas 2.3 or similar IMO.

Unless there is a wish do something now, I'll let this lay and I (or someone else) can pick this up at later, after pandas 2.0 has been released.

@phofl
Copy link
Member

phofl commented Nov 7, 2022

We want to release 3.0 significantly faster than 2.0, so would be ok to introduce in 2.0 I think. But we want to finish enforcing deprecations first

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Apr 20, 2023
@topper-123 topper-123 removed the Needs Triage Issue that has not been reviewed by a pandas team member label May 10, 2023
@topper-123
Copy link
Contributor Author

topper-123 commented May 10, 2023

Closing as superseded #52429, where the discussion is more current.

@topper-123 topper-123 closed this as not planned Won't fix, can't repro, duplicate, stale May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Closing Candidate May be closeable, needs more eyeballs Strings String extension data type and string data
Projects
None yet
Development

No branches or pull requests

4 participants