-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: preserve dtype for right/outer merge of datetime with different resolutions #53233
BUG: preserve dtype for right/outer merge of datetime with different resolutions #53233
Conversation
if ( | ||
lvals.dtype.kind == "M" | ||
and rvals.dtype.kind == "M" | ||
and result_dtype.kind == "O" | ||
): | ||
# TODO(non-nano) Workaround for common_type not dealing | ||
# with different resolutions | ||
result_dtype = key_col.dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an ugly band-aid solution: the proper fix is to handle this in find_common_type
. But I am not sure if changing find_common_type
would be in scope for a bug fix release? (xref #46587 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing find_common_type would affect concat, so im hesitant to call it a bugfix. But avoiding object may be nicer behavior, so id be open to it as an api change longer-term.
Do we know that both are tznaive at this point? If one is tzaware then object is correct (though i guess we should be able to determine that the merge will be empty?)
I've given a little bit of thought to something similar in the context of Index.get_indexer with mismatched resos. Instead of converting both to object, could find a fill_value not present in left
and convert right
to left.unit
, filling non-convertible entries with that fill_value. The "finding a fill_value" part seems really hacky though. (update: when i wrote this paragraph I thought this chunk of code was in get_merge_keys)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know that both are tznaive at this point? If one is tzaware then object is correct (though i guess we should be able to determine that the merge will be empty?)
I think so yes, normally we should raise an error earlier in the code if you try to merge on a tz-aware and tz-naive column:
In [1]: df1 = pd.DataFrame({"key": [pd.Timestamp("1970-01-01")], "a": [1]})
In [2]: df2 = pd.DataFrame({"key": [pd.Timestamp("1970-01-01", tz="Europe/Brussels")], "a": [1]})
In [3]: pd.merge(df1, df2)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 pd.merge(df1, df2)
File ~/scipy/pandas/pandas/core/reshape/merge.py:146, in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
129 @Substitution("\nleft : DataFrame or named Series")
130 @Appender(_merge_doc, indents=0)
131 def merge(
(...)
144 validate: str | None = None,
145 ) -> DataFrame:
--> 146 op = _MergeOperation(
147 left,
148 right,
149 how=how,
150 on=on,
151 left_on=left_on,
152 right_on=right_on,
153 left_index=left_index,
154 right_index=right_index,
155 sort=sort,
156 suffixes=suffixes,
157 indicator=indicator,
158 validate=validate,
159 )
160 return op.get_result(copy=copy)
File ~/scipy/pandas/pandas/core/reshape/merge.py:735, in _MergeOperation.__init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, indicator, validate)
727 (
728 self.left_join_keys,
729 self.right_join_keys,
730 self.join_names,
731 ) = self._get_merge_keys()
733 # validate the merge keys dtypes. We may need to coerce
734 # to avoid incompatible dtypes
--> 735 self._maybe_coerce_merge_keys()
737 # If argument passed to validate,
738 # check if columns specified as unique
739 # are in fact unique.
740 if validate is not None:
File ~/scipy/pandas/pandas/core/reshape/merge.py:1397, in _MergeOperation._maybe_coerce_merge_keys(self)
1393 raise ValueError(msg)
1394 elif not isinstance(lk.dtype, DatetimeTZDtype) and isinstance(
1395 rk.dtype, DatetimeTZDtype
1396 ):
-> 1397 raise ValueError(msg)
1398 elif (
1399 isinstance(lk.dtype, DatetimeTZDtype)
1400 and isinstance(rk.dtype, DatetimeTZDtype)
1401 ) or (lk.dtype.kind == "M" and rk.dtype.kind == "M"):
1402 # allows datetime with different resolutions
1403 continue
ValueError: You are trying to merge on datetime64[ns] and datetime64[ns, Europe/Brussels] columns for key 'key'. If you wish to proceed you should use pd.concat
Thanks @jorisvandenbossche |
…ge of datetime with different resolutions
…er merge of datetime with different resolutions) (#53275) Backport PR #53233: BUG: preserve dtype for right/outer merge of datetime with different resolutions Co-authored-by: Joris Van den Bossche <[email protected]>
* ENH: better dtype inference when doing DataFrame reductions * precommit issues * fix failures * fix failures * mypy + some docs * doc linting linting * refactor to use _reduce_with_wrap * docstring linting * pyarrow failure + linting * pyarrow failure + linting * linting * doc stuff * linting fixes * fix fix doc string * remove _wrap_na_result * doc string example * pyarrow + categorical * silence bugs * silence errors * silence errors II * fix errors III * various fixups * various fixups * delay fixing windows and 32bit failures * BUG: Adding a columns to a Frame with RangeIndex columns using a non-scalar key (#52877) * DOC: Update whatsnew (#52882) * CI: Change development python version to 3.10 (#51133) * CI: Change development python version to 3.10 * Update checks * Remove strict * Remove strict * Fixes * Add dt * Switch python to 3.9 * Remove * Fix * Try attribute * Adjust * Fix mypy * Try fixing doc build * Fix mypy * Fix stubtest * Remove workflow file * Rename back * Update * Rename * Rename * Change python version * Remove * Fix doc errors * Remove pypy * Update ci/deps/actions-pypy-39.yaml Co-authored-by: Thomas Li <[email protected]> * Revert pypy removal * Remove again * Fix * Change to 3.9 * Address --------- Co-authored-by: Thomas Li <[email protected]> * update * update * add docs * fix windows tests * fix windows tests * remove guards for 32bit linux * add bool tests + fix 32-bit failures * fix pre-commit failures * fix mypy failures * rename _reduce_with -> _reduce_and_wrap * assert missing attributes * reduction dtypes on windows and 32bit systems * add tests for min_count=0 * PERF:median with axis=1 * median with axis=1 fix * streamline Block.reduce * fix comments * FIX preserve dtype with datetime columns of different resolution when merging (#53213) * BUG Merge not behaving correctly when having `MultiIndex` with a single level (#53215) * fix merge when MultiIndex with single level * resolved conversations * fixed code style * BUG: preserve dtype for right/outer merge of datetime with different resolutions (#53233) * remove special BooleanArray.sum method * remove BooleanArray.prod * fixes * Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Joris Van den Bossche <[email protected]> * Update pandas/core/array_algos/masked_reductions.py Co-authored-by: Joris Van den Bossche <[email protected]> * small cleanup * small cleanup * only reduce 1d * fix after #53418 * update according to comments * revome note * update _minmax * REF: add keepdims parameter to ExtensionArray._reduce + remove ExtensionArray._reduce_and_wrap * REF: add keepdims parameter to ExtensionArray._reduce + remove ExtensionArray._reduce_and_wrap * fix whatsnew * fix _reduce call * simplify test * add tests for any/all --------- Co-authored-by: Patrick Hoefler <[email protected]> Co-authored-by: Thomas Li <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Yao Xiao <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]>
Follow-up on #53213, fixing one more corner case for the right/outer joins making the key column object dtype.
Expanded the test added in that PR to test the different join types.
The whatsnew added in #53213 should also already cover this.