-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: EADtype._find_compatible_dtype #53106
Conversation
pandas/core/arrays/arrow/dtype.py
Outdated
|
||
def _maybe_promote(self, item: Any) -> tuple[DtypeObj, Any]: | ||
if isinstance(item, pa.Scalar): | ||
if not item.is_valid: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't NA always be able to be inserted into ArrowExtensionArray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure. pyarrow nulls are typed, so we could plausibly want to disallow e.g. <pyarrow.TimestampScalar: None>
in a pyarrow integer dtype
elif isna(value): | ||
new_dtype = None | ||
new_dtype = None | ||
if is_list_like(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for ArrowDtype with pa.list_
type, we would want to treat value
like a scalar e.g
ser = pd.Series([[1, 1]], dtype=pd.ArrowDtype(pa.list_(pa.int64())))
ser[0] = [1, 2]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah, getting rid of this is_list_like check causes us to incorrectly raise on numpy non-object cases when using a list value (for which we don't have any tests). Can fix that in this PR or separately, as it is a bit more invasive.
pandas/core/arrays/arrow/dtype.py
Outdated
item = item.as_py() | ||
|
||
elif item is None or item is libmissing.NA: | ||
# TODO: np.nan? use is_valid_na_for_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since pyarrow supports nan vs NA, possibly we want to allow nan if pa.types.is_floating(self.pyarrow_dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what to do here depends on making a decision about when/how to distinguish between np.nan and pd.NA (which i hope to finally nail down at the sprint). doing this The Right Way would involve something like implementing EA._is_valid_na_for_dtype
pandas/core/arrays/arrow/dtype.py
Outdated
# TODO: ask joris for help making these checks more robust | ||
if item.type == self.pyarrow_dtype: | ||
return self, item.as_py() | ||
if item.type.to_pandas_dtype() == np.int64 and self.kind == "i": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just to get the tests copied from #52833 passing.
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
def _maybe_promote(self, item: Any) -> tuple[DtypeObj, Any]: | ||
if isinstance(item, pa.Scalar): | ||
if not item.is_valid: | ||
# TODO: ask joris for help making these checks more robust |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche any thoughts here? (not time-sensitive)
Updated with some added tests, changed EADtype._maybe_promote->_find_compatible_dtype. There are some TODOs mostly for pyarrow and related to nan-vs-NA, but I don't think any of them are blockers. |
_find_compatible_dtype can be used to implement a default implementation of _is_valid_scalar_for, which can be used to implement _from_scalars (both discussed on this week's dev call). |
@@ -2326,3 +2347,29 @@ def __from_arrow__(self, array: pa.Array | pa.ChunkedArray): | |||
array_class = self.construct_array_type() | |||
arr = array.cast(self.pyarrow_dtype, safe=True) | |||
return array_class(arr) | |||
|
|||
def _find_compatible_dtype(self, item: Any) -> tuple[DtypeObj, Any]: | |||
if isinstance(item, pa.Scalar): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if item is null? The pyarrow null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC pyarrow nulls are now typed. id prefer to be strict about making these match, but dont care that much. am hoping @jorisvandenbossche will weigh in
[ | ||
(pa.scalar(4, type="int32"), 4, "int32[pyarrow]"), | ||
(pa.scalar(4, type="int64"), 4, "int32[pyarrow]"), | ||
# (pa.scalar(4.5, type="float64"), 4, "int32[pyarrow]"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens here?
Also what happens with a int64 scalar and int32 dtype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id want to follow the same logic we do for numpy dtypes, but was punting here in expectation of doing it in a follow-up (likely involving joris expressing an opinion)
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Tests copied from #52833, so this may close #52235.
Needs a lot of cleanup, in particular docs. The ArrowDtype._find_compatible_dtype implementation is quite kludgy, would appreciate help from someone more knowledgeable.
I think
_find_compatible_dtype
is the "correct" long-term non-kludge solution for setitem-with-expansion.