PERF/CLN: Improve datetime-like index ops perf #10277

sinhrks · 2015-06-04T15:19:00Z

Add _isnan to DatetimeIndexOpsMixin to cache NaT mask.

This leads to some perf improvement which is noticeable in larger data.

after fix

import pandas as pd
import numpy as np

np.random.seed(1)
idx = pd.DatetimeIndex(np.append(np.array([pd.tslib.iNaT]),
                       np.random.randint(500, 1000, size=100000000)))
%timeit idx.max()
1 loops, best of 3: 599 ms per loop

%timeit idx.min()
1 loops, best of 3: 608 ms per loop

before fix:

%timeit idx.max()
1 loops, best of 3: 940 ms per loop

%timeit idx.min()
1 loops, best of 3: 883 ms per loop

jreback · 2015-06-04T15:25:54Z

ohh nice!

do we have sufficient benches for this?

sinhrks · 2015-06-04T15:27:02Z

Updated top description. It is harder to notice in small size of Index. Shoud I add vbench for this?

jreback · 2015-06-04T15:34:43Z

yes I would add a vbench for this.

btw, I think its actually useful to use .hasnans to skip some of the masking operations

min_stamp = self[~self._isnan].asi8.min()
can be

if self._hasnans:
    self = self[~self._isnan]
min_stamp = self.asi8.min()

on a NO nan series that would be a big win (assume that this is not the first operation, e.g. it obviously needs caching).

here's another thing. I think you can actually do a lot of this not in the constructor but in certain operations
(this may require you to keep another attribute)

e.g.

if you construct from a date_range you KNOW that you don't have any NaT already.
On insert/append you can see if you have any nans (in the appended one and the existing), so you don't actually have to do the check.

So it can still be lazy but in some cases you can set it a-priori w/o actually doing the check.

sinhrks · 2015-06-05T22:31:10Z

@jreback Exactly! As a first step, added hasnans conditions before mask (_isnan)

I think you can actually do a lot of this not in the constructor but in certain operations
(this may require you to keep another attribute)

I assume it means adding new property to _attributes to show whether data has NaT or not. From my understanding, implementation should be:

Adding a flag which shows whether the data MAY contains NaT, like _maybe_hasnans.
Update the value to False when we can confirm the value DOESN'T contains NaT
When _maybe_hasnans is False, hasnans returns False without checking _isnan.
When _maybe_hasnans is True, hasnans checks _isnan and return the result, cache itself and change _maybe_hasnans.

jreback · 2015-06-05T22:33:23Z

@sinhrks yep, exactly. For instance, Int64Index, will always be False for this.

jreback · 2015-07-28T22:05:00Z

status?

jreback · 2015-08-18T23:45:21Z

@sinhrks status?

sinhrks · 2015-08-31T13:48:39Z

I couldn't undertake to add an attribute which indicates `NaN`` possibility.

Because this is PR improve a perf somewhat, can I do it separately?

jreback · 2015-10-18T14:02:02Z

I think this needs to wait for Index.fillna yes?

sinhrks · 2015-10-19T14:04:40Z

Correct, will finish #11343 first.

jreback · 2015-11-20T18:45:23Z

@sinhrks I belive we can put this in now? move whatsnew note to 0.18.0

jreback · 2015-12-06T19:18:20Z

@sinhrks status?

sinhrks · 2015-12-10T10:42:50Z

@jreback Rebased, and ready for review.

I misunderstood about fillna. It can't be used because the target is np.ndarray.

jreback · 2015-12-10T12:22:48Z

thanks!

PERF/CLN: Improve datetime-like index ops perf

sinhrks force-pushed the dti_isnan branch from 021e1d9 to 81dba66 Compare June 4, 2015 15:22

jreback added Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Performance Memory or execution speed performance labels Jun 4, 2015

sinhrks force-pushed the dti_isnan branch from 81dba66 to f795f01 Compare June 5, 2015 15:35

jreback added this to the 0.17.0 milestone Jun 5, 2015

sinhrks force-pushed the dti_isnan branch from f795f01 to 8723944 Compare June 19, 2015 14:18

sinhrks force-pushed the dti_isnan branch from 8723944 to ef67632 Compare August 21, 2015 15:28

jreback modified the milestones: Next Major Release, 0.17.0 Aug 26, 2015

sinhrks force-pushed the dti_isnan branch from ef67632 to 88d2d12 Compare August 31, 2015 13:45

sinhrks force-pushed the dti_isnan branch from 88d2d12 to ac3c199 Compare October 15, 2015 12:42

sinhrks force-pushed the dti_isnan branch from ac3c199 to 20366d5 Compare November 13, 2015 07:35

sinhrks force-pushed the dti_isnan branch 3 times, most recently from 66a378c to 7b7652f Compare November 24, 2015 20:29

PERF: Improve dt-liket index ops

4b1aa75

sinhrks force-pushed the dti_isnan branch from 7b7652f to 4b1aa75 Compare December 8, 2015 14:29

jreback added a commit that referenced this pull request Dec 10, 2015

Merge pull request #10277 from sinhrks/dti_isnan

eec0bc6

PERF/CLN: Improve datetime-like index ops perf

jreback merged commit eec0bc6 into pandas-dev:master Dec 10, 2015

sinhrks deleted the dti_isnan branch December 10, 2015 12:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF/CLN: Improve datetime-like index ops perf #10277

PERF/CLN: Improve datetime-like index ops perf #10277

sinhrks commented Jun 4, 2015

jreback commented Jun 4, 2015

sinhrks commented Jun 4, 2015

jreback commented Jun 4, 2015

sinhrks commented Jun 5, 2015

jreback commented Jun 5, 2015

jreback commented Jul 28, 2015

jreback commented Aug 18, 2015

sinhrks commented Aug 31, 2015

jreback commented Oct 18, 2015

sinhrks commented Oct 19, 2015

jreback commented Nov 20, 2015

jreback commented Dec 6, 2015

sinhrks commented Dec 10, 2015

jreback commented Dec 10, 2015

PERF/CLN: Improve datetime-like index ops perf #10277

PERF/CLN: Improve datetime-like index ops perf #10277

Conversation

sinhrks commented Jun 4, 2015

after fix

before fix:

jreback commented Jun 4, 2015

sinhrks commented Jun 4, 2015

jreback commented Jun 4, 2015

sinhrks commented Jun 5, 2015

jreback commented Jun 5, 2015

jreback commented Jul 28, 2015

jreback commented Aug 18, 2015

sinhrks commented Aug 31, 2015

jreback commented Oct 18, 2015

sinhrks commented Oct 19, 2015

jreback commented Nov 20, 2015

jreback commented Dec 6, 2015

sinhrks commented Dec 10, 2015

jreback commented Dec 10, 2015