-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: DataFrame.append with timedelta64 #39574
Conversation
pandas/core/dtypes/concat.py
Outdated
@@ -97,7 +97,7 @@ def _cast_to_common_type(arr: ArrayLike, dtype: DtypeObj) -> ArrayLike: | |||
return arr.astype(dtype, copy=False) | |||
|
|||
|
|||
def concat_compat(to_concat, axis: int = 0): | |||
def concat_compat(to_concat, axis: int = 0, pretend_axis1: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you just allow axis=None and call it that case. this is very odd naming here (appreciate the de-duplication that this allows); so i guess a better question is this temporary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you just allow axis=None and call it that case. this is very odd naming here
I want the naming to be very clear that this is a dont-try-this-at-home kludge (need to update the docstring to that effect)
so i guess a better question is this temporary?
inasmuch as it wont be needed once we have 2D EAs, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to the extent this is not removed anytime soon, can you come up with a better argument name or another way of doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed + green
Can you summarize the behaviour that is changed / bug that is fixed? |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am personally a bit hesitant to change the all-NaN special case. That's longstanding behaviour (also for practical reasons), that will break code I think.
pandas/core/dtypes/concat.py
Outdated
@@ -72,6 +72,8 @@ def concat_compat(to_concat, axis: int = 0): | |||
---------- | |||
to_concat : array of arrays | |||
axis : axis to provide concatenation | |||
ea_compat_axis : bool, default False | |||
For ExtensionArray compat, behave as if axis == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what this means in practice to behave "as if axis=1"? Because I assume the arrays are still concatenated with axis=0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just matters for the dropping-empty check; will edit to clarify
@jorisvandenbossche helpful comments, thank you. i found a nice way to retain the None/nan behavior while fixing the original bug: supplementing JoinUnit.is_na with a check that it is a compatible NA. |
pandas/core/internals/concat.py
Outdated
if self.dtype == object: | ||
values = self.block.values | ||
return all( | ||
is_valid_nat_for_dtype(x, dtype) for x in values.ravel(order="K") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not only required for object dtype, I think. Also float NaN is considered "all NaN" when it comes to ignoring the dtype in concatting dataframes (and other dtypes as well I think):
In [39]: pd.concat([pd.DataFrame({'a': [np.nan, np.nan]}), pd.DataFrame({'a': [pd.Timestamp("2012-01-01")]})])
Out[39]:
a
0 NaT
1 NaT
0 2012-01-01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the non-object case is handled below on L245-246. or do you have something else in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does my snippet above work with this PR?
(if so, then I don't fully understand why the changes to test_append_empty_frame_to_series_with_dateutil_tz
are needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does my snippet above work with this PR?
yes it does
(if so, then I don't fully understand why the changes to test_append_empty_frame_to_series_with_dateutil_tz are needed)
I think that's driven by something sketchy-looking in get_reindexed_values, will see if that can be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better?
@jorisvandenbossche i think your comments have all been addressed, lmk if i missed anything |
@jorisvandenbossche gentle ping before your day ends |
Can you add a whatsnew? |
will do, thanks |
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
tm.assert_frame_equal(result_b, expected) | ||
|
||
# column order is different | ||
expected = expected[["c", "d", "date", "a", "b"]] | ||
result = df.append([s, s], ignore_index=True) | ||
dtype = Series([date]).dtype | ||
expected["date"] = expected["date"].astype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed? (might be a left-over from astyping it to object before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last minor comment at https://github.com/pandas-dev/pandas/pull/39574/files#r575450869
Nice clean-up!
AFAICT several of the existing tests are just wrong, xref #39122 cc @jorisvandenbossche
Fixes (but havent yet added a test for) #39037 (comment)