Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for concat with categorical + datetime dataframes #45527

Conversation

ryangilmour
Copy link
Contributor

),
),
)
def test_categorical_datetime_concat(self, input_1, input_2, expected):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write the cases down explicitly. We do not gain anything through the parametrization and this is really hard to read

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make those changes. Before setting off the CI to fail again though, as mentioned in this comment, my PyTests are passing locally, but not on the CI. Could do with some help on why this might be?

Locally I get below to pass:

    def test_categorical_datetime_concat(self):
        # GH 39443
        # test catergorical dataframes both containing datetimes
        df1 = DataFrame(
            {"x": Series(datetime(2021, 1, 1), index=[0], dtype="category")}
        )
        df2 = DataFrame(
            {"x": Series(datetime(2021, 1, 2), index=[1], dtype="category")}
        )
        expected = DataFrame(
            {"x": Series([datetime(2021, 1, 1), datetime(2021, 1, 2)])}
        )
        result = pd.concat([df1, df2])
        tm.assert_equal(result, expected)

But in the CI (based on previous failure) - this will fail with:

E       AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different
E       
E       Attribute "dtype" are different
E       [left]:  CategoricalDtype(categories=['2021-01-01', '2021-01-02'], ordered=False)
E       [right]: datetime64[ns]

Looks like it links to #14016

@jreback jreback added the Testing pandas testing functions or related to the test suite label Jan 22, 2022
@jreback jreback added this to the 1.5 milestone Jan 23, 2022
@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jan 23, 2022
@jreback jreback removed this from the 1.5 milestone Jan 23, 2022
@jreback
Copy link
Contributor

jreback commented Jan 31, 2022

can you merge master and address comments

@ryangilmour
Copy link
Contributor Author

ryangilmour commented Feb 1, 2022

Okay - looks like the behaviour of the code in the CI is working to concat:

# test catergorical dataframes both containing datetimes
df1 = DataFrame({"x": Series(datetime(2021, 1, 1), index=[0], dtype="category")})
df2 = DataFrame({"x": Series(datetime(2021, 1, 2), index=[1], dtype="category")})
result = pd.concat([df1, df2])

With a result of:

DataFrame({"x": Series([datetime(2021, 1, 1), datetime(2021, 1, 2)]), dtype="category"})

This is different to what I have locally, but probably just because I have an old version built?

Can I just assume this is the expected behaviour and write tests to satisfy that? It seems logical that if both DataFrames are categorical, the results should be the union of the two categories.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2022

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Mar 4, 2022
@jreback
Copy link
Contributor

jreback commented Mar 4, 2022

@ryangilmour if you can merge master and update

@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. Closing, but if interested in continuing please merge in main, address the comments and the team can have another look.

@mroeschke mroeschke closed this Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode Stale Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Concatenating categorical datetime columns raises a ValueError since v1.2
4 participants