-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: de-duplicate index validation code #22329
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mention you used the "strictest" one: what is exactly the difference between the current versions?
Can you also do a quick check of performance of a call that hits those functions?
pandas/_libs/util.pxd
Outdated
@@ -44,23 +44,50 @@ ctypedef fused numeric: | |||
cnp.float64_t | |||
|
|||
|
|||
cdef inline object get_value_at(ndarray arr, object loc): | |||
cdef inline Py_ssize_t validate_indexer(ndarray arr, object loc) except? -1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need the question mark in except?
? (since -1 can never be actually returned from the function without it being an error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the except is to allow for the IndexError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my comment is about the ?
, not the except
itself
yeah these are in criticial paths of perf. pls do a check. |
Codecov Report
@@ Coverage Diff @@
## master #22329 +/- ##
=======================================
Coverage 92.05% 92.05%
=======================================
Files 169 169
Lines 50709 50709
=======================================
Hits 46679 46679
Misses 4030 4030
Continue to review full report at Codecov.
|
Running asv now.
The
Corner case handling. e.g. if |
asv results:
|
The asv is clearly not really relevant. Can you do a quick check of the impacted functions with a direct call? Eg
You can do a quick
There is an extra check whether there is an error raised or not. I don't assume this is costly, but since we know that it will always be an error, I think it is more cleanly code-wise to reflect this in the |
OK, that sounds like the good change! |
I’ll remove the question mark when possible. At the moment the power
company has decided to do some surprise maintenance, so that might be a
while...
As to the asv, yah it’s not clearly relevant, but it’s low-level code that
gets used a lot so I guessed that running all frame benchmarks would hit
it. I’ll try your suggestion when the power comes back.
…On Tue, Aug 14, 2018 at 12:07 PM Joris Van den Bossche < ***@***.***> wrote:
But if i = -14 only the strict one will check that i + len(arr) is still
negative.
OK, that sounds like the good change!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#22329 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHtGeGJKyGmQX9uObIMNzyA-3pr8ob_5ks5uQyAEgaJpZM4V7qQO>
.
|
Indistinguishable: Master:
PR:
|
I don't think your example with |
Master:
PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u add these as asvs?
@jreback I am not sure that is needed. It should already be covered by other indexing benchmarks that use |
pandas/_libs/index.pyx
Outdated
|
||
i = util.validate_indexer(arr, loc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since get_value_at
(the util version below, which is called by the get_value_at
in this file) now has the same validation, is it then not unnecessary to call the validation here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you’re right, will update.
Following this and #22344 it tentatively looks like we'll be ready to get rid of numpy_helper (and chunks of util) altogether. |
@jreback gentle ping. After this we can get rid of a bunch of old numpy_helper code. |
thanks! yeah lots of PRs! |
There are currently 3 nearly-identical versions of this code. I'm pretty sure the strictest one is the most-correct, so that is made into the only one.