-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST(string dtype): Resolve xfails in pytables #60795
base: main
Are you sure you want to change the base?
Conversation
if using_infer_string: | ||
# TODO: Test is incorrect when not using_infer_string. | ||
# Should take the last 4 rows uncondiationally. | ||
expected = expected[16:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to make sure this is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the string type is just truncating the last 4 rows? Is it an invalid unicode sequence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DataFrame is all NaN in rows 0-15 inclusive (L250 in this PR), and it is being appended to the store with dropna=True
.
if using_infer_string: | ||
# TODO: Test is incorrect when not using_infer_string. | ||
# Should take the last 4 rows uncondiationally. | ||
expected = expected[16:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the string type is just truncating the last 4 rows? Is it an invalid unicode sequence?
@@ -822,10 +826,11 @@ def test_append_raise(setup_path): | |||
df["foo"] = Timestamp("20130101") | |||
store.append("df", df) | |||
df["foo"] = "bar" | |||
shape = "(30,)" if using_infer_string else "(1, 30)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this still a bug? Not sure why we would expect a different shape with the string data types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed it was due to array-backed vs 1d ndarray backed data. But I haven't checked too deeply.
"Cannot serialize the column [datetime1]\nbecause its data " | ||
"contents are not [string] but [date] object dtype" | ||
), | ||
re.escape("[date] is not implemented as a table column"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original error message here is much clearer - is there no way to catch and raise that for the string types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean, this error is not raised for string types. It's being raised for date types.
When infer_string=False
, this function is passed a single object block with a mix of strings and dates. In this case, the data of the block is inferred as mixed, and then checked column-by-column. This is where the top message (which I think is confusing) is raised. When infer_string=True
, each string array is fed into this function individually and does not raise. Then the object block is fed in containing only dates. This is inferred as dates, and the corresponding error message is raised.
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Looks like using
where
that results in empty will still give object dtype. xfailing those tests here and plan to tackle in a followup.