Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Honor the strings_to_categorical keyword in to_pandas for string view type #45175

Closed
jorisvandenbossche opened this issue Jan 5, 2025 · 2 comments
Assignees
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 5, 2025

Currently this keyword works for string or large string:

>>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.string())})
>>> table.to_pandas(strings_to_categorical=True).dtypes
col    category
dtype: object
>>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.large_string())})
>>> table.to_pandas(strings_to_categorical=True).dtypes
col    category
dtype: object

but not for string view:

>>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.string_view())})
>>> table.to_pandas(strings_to_categorical=True).dtypes
col    object
dtype: object

For consistency we should make that keyword check for string view columns as well, I think

From https://github.com/apache/arrow/pull/44195/files#r1901831460

raulcd pushed a commit that referenced this issue Jan 7, 2025
…das for string view type (#45176)

### Rationale for this change

Currently this keyword works for string or large string:

```python
>>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.string())})
>>> table.to_pandas(strings_to_categorical=True).dtypes
col    category
dtype: object
>>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.large_string())})
>>> table.to_pandas(strings_to_categorical=True).dtypes
col    category
dtype: object
```

but not for string view:

```python
>>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.string_view())})
>>> table.to_pandas(strings_to_categorical=True).dtypes
col    object
dtype: object
```

For consistency we should make that keyword check for string view columns as well, I think

From https://github.com/apache/arrow/pull/44195/files#r1901831460

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes, when using the `strings_to_categorical=True` keyword and having a string_view type, this column will now be converted to a pandas Categorical

* GitHub Issue: #45175

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
@raulcd
Copy link
Member

raulcd commented Jan 7, 2025

Issue resolved by pull request 45176
#45176

@raulcd raulcd added this to the 19.0.0 milestone Jan 7, 2025
@raulcd raulcd closed this as completed Jan 7, 2025
@raulcd
Copy link
Member

raulcd commented Jan 7, 2025

I've read the linked issue and it seems we wanted this for 19.0.0. @amoeba I've marked it as 19.0.0

amoeba pushed a commit that referenced this issue Jan 7, 2025
…das for string view type (#45176)

### Rationale for this change

Currently this keyword works for string or large string:

```python
>>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.string())})
>>> table.to_pandas(strings_to_categorical=True).dtypes
col    category
dtype: object
>>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.large_string())})
>>> table.to_pandas(strings_to_categorical=True).dtypes
col    category
dtype: object
```

but not for string view:

```python
>>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.string_view())})
>>> table.to_pandas(strings_to_categorical=True).dtypes
col    object
dtype: object
```

For consistency we should make that keyword check for string view columns as well, I think

From https://github.com/apache/arrow/pull/44195/files#r1901831460

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes, when using the `strings_to_categorical=True` keyword and having a string_view type, this column will now be converted to a pandas Categorical

* GitHub Issue: #45175

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
amoeba pushed a commit that referenced this issue Jan 11, 2025
…das for string view type (#45176)

### Rationale for this change

Currently this keyword works for string or large string:

```python
>>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.string())})
>>> table.to_pandas(strings_to_categorical=True).dtypes
col    category
dtype: object
>>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.large_string())})
>>> table.to_pandas(strings_to_categorical=True).dtypes
col    category
dtype: object
```

but not for string view:

```python
>>> table = pa.table({"col": pa.array(["a", "b", "a"], pa.string_view())})
>>> table.to_pandas(strings_to_categorical=True).dtypes
col    object
dtype: object
```

For consistency we should make that keyword check for string view columns as well, I think

From https://github.com/apache/arrow/pull/44195/files#r1901831460

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes, when using the `strings_to_categorical=True` keyword and having a string_view type, this column will now be converted to a pandas Categorical

* GitHub Issue: #45175

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants