TST(string dtype): Resolve xfails in pytables #60795

rhshadrach · 2025-01-26T13:23:32Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Looks like using where that results in empty will still give object dtype. xfailing those tests here and plan to tackle in a followup.

pandas/io/pytables.py

…xfail_pytable_misc

rhshadrach · 2025-01-29T02:36:24Z

pandas/tests/io/pytables/test_append.py

+            if using_infer_string:
+                # TODO: Test is incorrect when not using_infer_string.
+                #       Should take the last 4 rows uncondiationally.
+                expected = expected[16:]


Would like to make sure this is correct.

So the string type is just truncating the last 4 rows? Is it an invalid unicode sequence?

The DataFrame is all NaN in rows 0-15 inclusive (L250 in this PR), and it is being appended to the store with dropna=True.

…xfail_pytable_misc

WillAyd · 2025-02-03T14:17:07Z

pandas/tests/io/pytables/test_append.py

+            if using_infer_string:
+                # TODO: Test is incorrect when not using_infer_string.
+                #       Should take the last 4 rows uncondiationally.
+                expected = expected[16:]


So the string type is just truncating the last 4 rows? Is it an invalid unicode sequence?

WillAyd · 2025-02-03T14:18:48Z

pandas/tests/io/pytables/test_append.py

@@ -822,10 +826,11 @@ def test_append_raise(setup_path):
        df["foo"] = Timestamp("20130101")
        store.append("df", df)
        df["foo"] = "bar"
+        shape = "(30,)" if using_infer_string else "(1, 30)"


Isn't this still a bug? Not sure why we would expect a different shape with the string data types?

I assumed it was due to array-backed vs 1d ndarray backed data. But I haven't checked too deeply.

WillAyd · 2025-02-03T14:19:51Z

pandas/tests/io/pytables/test_errors.py

+                    "Cannot serialize the column [datetime1]\nbecause its data "
+                    "contents are not [string] but [date] object dtype"
+                ),
+                re.escape("[date] is not implemented as a table column"),


I think the original error message here is much clearer - is there no way to catch and raise that for the string types?

Not sure what you mean, this error is not raised for string types. It's being raised for date types.

When infer_string=False, this function is passed a single object block with a mix of strings and dates. In this case, the data of the block is inferred as mixed, and then checked column-by-column. This is where the top message (which I think is confusing) is raised. When infer_string=True, each string array is fed into this function individually and does not raise. Then the object block is fed in containing only dates. This is inferred as dates, and the corresponding error message is raised.

TST(string dtype): Resolve xfails in pytables

120b0bd

rhshadrach added Testing pandas testing functions or related to the test suite IO HDF5 read_hdf, HDFStore Strings String extension data type and string data labels Jan 26, 2025

rhshadrach added this to the 2.3 milestone Jan 26, 2025

Cleanup

adb0349

rhshadrach commented Jan 26, 2025

View reviewed changes

pandas/io/pytables.py Outdated Show resolved Hide resolved

rhshadrach marked this pull request as draft January 26, 2025 13:47

rhshadrach mentioned this pull request Jan 26, 2025

BUG: is_*_array returns true on empty object dtype #60796

Merged

5 tasks

rhshadrach added 4 commits January 28, 2025 20:02

Merge branch 'main' of https://github.com/pandas-dev/pandas into str_…

1413d27

…xfail_pytable_misc

Revert code change

70563ce

type-ignore

d12f66d

xfails and fixes

bc5b697

rhshadrach commented Jan 29, 2025

View reviewed changes

rhshadrach added 2 commits January 30, 2025 16:22

Merge branch 'main' of https://github.com/pandas-dev/pandas into str_…

55ea98b

…xfail_pytable_misc

More strict xfail

cb56243

rhshadrach requested review from jorisvandenbossche and WillAyd February 2, 2025 21:19

rhshadrach marked this pull request as ready for review February 2, 2025 21:19

WillAyd reviewed Feb 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST(string dtype): Resolve xfails in pytables #60795

TST(string dtype): Resolve xfails in pytables #60795

rhshadrach commented Jan 26, 2025 •

edited

Loading

rhshadrach Jan 29, 2025

WillAyd Feb 3, 2025

rhshadrach Feb 4, 2025

WillAyd Feb 3, 2025

WillAyd Feb 3, 2025

rhshadrach Feb 4, 2025

WillAyd Feb 3, 2025

rhshadrach Feb 4, 2025 •

edited

Loading

TST(string dtype): Resolve xfails in pytables #60795

Are you sure you want to change the base?

TST(string dtype): Resolve xfails in pytables #60795

Conversation

rhshadrach commented Jan 26, 2025 • edited Loading

rhshadrach Jan 29, 2025

Choose a reason for hiding this comment

WillAyd Feb 3, 2025

Choose a reason for hiding this comment

rhshadrach Feb 4, 2025

Choose a reason for hiding this comment

WillAyd Feb 3, 2025

Choose a reason for hiding this comment

WillAyd Feb 3, 2025

Choose a reason for hiding this comment

rhshadrach Feb 4, 2025

Choose a reason for hiding this comment

WillAyd Feb 3, 2025

Choose a reason for hiding this comment

rhshadrach Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

rhshadrach commented Jan 26, 2025 •

edited

Loading

rhshadrach Feb 4, 2025 •

edited

Loading