Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: extension ops #21191

Closed
wants to merge 12 commits into from
Closed

ENH: extension ops #21191

wants to merge 12 commits into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented May 24, 2018

builds on #21185 for extension array ops
pre-cursor to #21160

@jreback jreback added the ExtensionArray Extending pandas with custom dtypes or arrays. label May 24, 2018
@jreback jreback added this to the 0.24.0 milestone May 24, 2018
@codecov
Copy link

codecov bot commented May 24, 2018

Codecov Report

Merging #21191 into master will decrease coverage by 0.07%.
The diff coverage is 60.97%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21191      +/-   ##
==========================================
- Coverage   91.84%   91.77%   -0.08%     
==========================================
  Files         153      153              
  Lines       49543    49618      +75     
==========================================
+ Hits        45504    45538      +34     
- Misses       4039     4080      +41
Flag Coverage Δ
#multiple 90.17% <60.97%> (-0.07%) ⬇️
#single 41.86% <47.96%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/io/formats/format.py 98.24% <ø> (ø) ⬆️
pandas/core/indexes/interval.py 93.14% <100%> (ø) ⬆️
pandas/core/dtypes/common.py 95.09% <100%> (+0.44%) ⬆️
pandas/core/missing.py 91.66% <100%> (+0.02%) ⬆️
pandas/core/series.py 94.12% <100%> (ø) ⬆️
pandas/core/algorithms.py 94.5% <100%> (ø) ⬆️
pandas/core/arrays/base.py 59.84% <21.56%> (-24.11%) ⬇️
pandas/core/dtypes/base.py 88.09% <50%> (-4.02%) ⬇️
pandas/core/dtypes/cast.py 87.93% <50%> (-0.13%) ⬇️
pandas/core/internals.py 95.52% <81.81%> (-0.07%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5348e06...f62c1cf. Read the comment docs.

"""
Fixture for dunder names for common compare operations
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

cls.__div__ = cls._make_arithmetic_op(operator.div)
cls.__rdiv__ = cls._make_arithmetic_op(ops.rdiv)

cls.__divmod__ = cls._make_arithmetic_op(divmod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rdivmod?

@@ -1025,6 +1035,7 @@ def na_op(x, y):
return result

def safe_na_op(lvalues, rvalues):
# all others
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you flesh out this comment

is_extension_array_dtype(y) and not
is_scalar(y)):
y = x.__class__._from_sequence(y)
return op(x, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to put this in the closure? It seems like this case could be handled directly on line 1061.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way the r ops are defined is not compatible with some types, pretty much if you have a

rop(et, ndarray) will fail for some types, e.g. say ndarray is a datetime64[ns] then et is an integer like the precedence is wrong (IOW need to defer to the op here)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what you're saying, but that it isn't what I'm getting at. Let me try to rephrase:

In wrapper below, there is currently (in the PR)

        elif (is_extension_array_dtype(left) or
                is_extension_array_dtype(right)):
            pass

I'm suggesting that instead of pass there, the code above (currently 1006-1015) could go there. The main reason why this might not be appropriate is if the try/except in safe_na_op is relevant for this case.

# very well defined
elif (is_extension_array_dtype(x) or
is_extension_array_dtype(y)):
return op(x, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a viable option to move this out of the closure to ~ line 1218, that would be more in line with the goals we've been pursuing in this module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you notice categorical is already here, this is just another case, not sure how we could dispatch on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, categorical has a few characteristics that make it a difficult case in this function. I think you're right that (for now at least) part of this is in the right place. The thing to do here is to:

  • take the is_extension_array_dtype(x) part of this condition and put it down on L1231 (right after the is_categorical_dtype(self) block) and mimic the categorical_dtype case there
  • keep the is_extension_array_dtype(y) case as is here (should the op be reversed?)
  • check if this needs a and not is_scalar(y) like the categorical above for some corner case
  • consider merging the categorical_dtype blocks into this more general case

@jbrockmendel
Copy link
Member

Looks reasonable, with some comments on organization in core.ops. I haven't looked at the tests closely yet, since I think a lot of them will disappear from the diff after #21185.

@jreback jreback force-pushed the ops branch 2 times, most recently from 338ea4d to f6d5386 Compare May 25, 2018 11:33
if isinstance(right, ABCDataFrame):
return NotImplemented

left, right = _align_method_SERIES(left, right)
res_name = get_op_result_name(left, right)

if is_datetime64_dtype(left) or is_datetime64tz_dtype(left):
if is_categorical_dtype(left):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be made cleaner if/when we implement these ops in Categorical itself (they'll just raise the TypeError below)

@jreback
Copy link
Contributor Author

jreback commented May 29, 2018

@jbrockmendel note I haven't actually changed this much (IOW for your comments) yet.

@jbrockmendel
Copy link
Member

@jreback am I right in thinking that after #21885 goes through the diff for this PR will be much smaller (and therefore easier to review)?

@jreback
Copy link
Contributor Author

jreback commented May 31, 2018

yes

@jreback
Copy link
Contributor Author

jreback commented Jul 3, 2018

going to include directly in #21160 as much smaller now

@jreback jreback closed this Jul 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants