-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test for concat with categorical + datetime dataframes #45527
Add test for concat with categorical + datetime dataframes #45527
Conversation
ryangilmour
commented
Jan 21, 2022
- closes BUG: Concatenating categorical datetime columns raises a ValueError since v1.2 #39443
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry (N/A)
), | ||
), | ||
) | ||
def test_categorical_datetime_concat(self, input_1, input_2, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write the cases down explicitly. We do not gain anything through the parametrization and this is really hard to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make those changes. Before setting off the CI to fail again though, as mentioned in this comment, my PyTests are passing locally, but not on the CI. Could do with some help on why this might be?
Locally I get below to pass:
def test_categorical_datetime_concat(self):
# GH 39443
# test catergorical dataframes both containing datetimes
df1 = DataFrame(
{"x": Series(datetime(2021, 1, 1), index=[0], dtype="category")}
)
df2 = DataFrame(
{"x": Series(datetime(2021, 1, 2), index=[1], dtype="category")}
)
expected = DataFrame(
{"x": Series([datetime(2021, 1, 1), datetime(2021, 1, 2)])}
)
result = pd.concat([df1, df2])
tm.assert_equal(result, expected)
But in the CI (based on previous failure) - this will fail with:
E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different
E
E Attribute "dtype" are different
E [left]: CategoricalDtype(categories=['2021-01-01', '2021-01-02'], ordered=False)
E [right]: datetime64[ns]
Looks like it links to #14016
can you merge master and address comments |
…_concat_with_categorical_datetimes
Okay - looks like the behaviour of the code in the CI is working to concat: # test catergorical dataframes both containing datetimes
df1 = DataFrame({"x": Series(datetime(2021, 1, 1), index=[0], dtype="category")})
df2 = DataFrame({"x": Series(datetime(2021, 1, 2), index=[1], dtype="category")})
result = pd.concat([df1, df2]) With a result of: DataFrame({"x": Series([datetime(2021, 1, 1), datetime(2021, 1, 2)]), dtype="category"}) This is different to what I have locally, but probably just because I have an old version built? Can I just assume this is the expected behaviour and write tests to satisfy that? It seems logical that if both DataFrames are categorical, the results should be the union of the two categories. |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
@ryangilmour if you can merge master and update |
Thanks for the pull request, but it appears to have gone stale. Closing, but if interested in continuing please merge in main, address the comments and the team can have another look. |