PERF: Add cache keyword to to_datetime (#11665) #17077

mroeschke · 2017-07-26T05:45:13Z

closes PERF: add datetime caching kw in to_datetime #11665
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Added a cache keyword to to_datetime to speedup parsing of duplicate dates:

Some notes:

I defaulted cache=False i.e. don't use a cache to parse the dates. Should the default be True?
I used pd.unique() to identify unique dates, and the current implementation did not accept a tuple of strings (objects). I added tuple_to_object_array and patched _ensure_arraylike to fix this.
There is currently an included test that fails due to a case when using to_datetime(..., utc=True) with a Series. I am inclined to believe In[5] should have been the existing behavior. Thoughts?

In [2]: test_dates = ['20130101 00:00:00'] * 10

In [3]: s = pd.Series(test_dates)

# Same as existing behavior
In [4]: pd.to_datetime(s, utc=True, cache_datetime=False)
Out[4]: 
0   2013-01-01
1   2013-01-01
2   2013-01-01
3   2013-01-01
4   2013-01-01
5   2013-01-01
6   2013-01-01
7   2013-01-01
8   2013-01-01
9   2013-01-01
dtype: datetime64[ns]

In [5]: pd.to_datetime(s, utc=True, cache_datetime=True)
Out[5]: 
0   2013-01-01 00:00:00+00:00
1   2013-01-01 00:00:00+00:00
2   2013-01-01 00:00:00+00:00
3   2013-01-01 00:00:00+00:00
4   2013-01-01 00:00:00+00:00
5   2013-01-01 00:00:00+00:00
6   2013-01-01 00:00:00+00:00
7   2013-01-01 00:00:00+00:00
8   2013-01-01 00:00:00+00:00
9   2013-01-01 00:00:00+00:00
dtype: datetime64[ns, UTC]

gfyoung · 2017-07-26T05:52:22Z

Let me address 1 and 3 first: IIUC, this caching should only speed up performance. The results you get with or without caching should not be different. The fact that you are getting different results indicates something is going weird with the caching.

In light of that reasoning, I would expect that caching should default to True.

gfyoung · 2017-07-26T05:53:29Z

For 2, I think that's worthy enough as its own PR. I'm not sure I fully agree with the patch (i.e. why do we need another special-case to_object_array method), but even so, you should separate that out as another PR and submit that for review (along with tests).

After that gets merged, you can rebase this PR onto that one.

jreback

need asv!

bench the cartesian product of several dtypes (int with unit, str, str with format, dti) and sizes (1000, 100000)
and all dupes, uniques

I suspect under a certain size cache doesn't matter/help, need to see where that is.

jreback · 2017-07-26T10:32:59Z

pandas/_libs/lib.pyx

@@ -375,6 +375,23 @@ cpdef ndarray[object] list_to_object_array(list obj):

 @cython.wraparound(False)
 @cython.boundscheck(False)
+cpdef ndarray[object] tuple_to_object_array(tuple obj):


don't do this, you are dupliating lots of code. I suspect you are hitting

In [2]: pd.unique(('foo'),) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-2-bd2fadf86f43> in <module>() ----> 1 pd.unique(('foo'),) /Users/jreback/pandas/pandas/core/algorithms.py in unique(values) 348 """ 349 --> 350 values = _ensure_arraylike(values) 351 352 # categorical is a fast-path /Users/jreback/pandas/pandas/core/algorithms.py in _ensure_arraylike(values) 171 inferred = lib.infer_dtype(values) 172 if inferred in ['mixed', 'string', 'unicode']: --> 173 values = lib.list_to_object_array(values) 174 else: 175 values = np.asarray(values) TypeError: Argument 'obj' has incorrect type (expected list, got str)

if so, pls do a separate PR for this (it won't involve any cython code, just a simple change).

jreback · 2017-07-26T10:33:09Z

pandas/core/algorithms.py

@@ -170,7 +170,10 @@ def _ensure_arraylike(values):
                               ABCIndexClass, ABCSeries)):
        inferred = lib.infer_dtype(values)
        if inferred in ['mixed', 'string', 'unicode']:
-            values = lib.list_to_object_array(values)


jreback · 2017-07-26T10:34:32Z

pandas/core/tools/datetimes.py

@@ -183,7 +184,8 @@ def _guess_datetime_format_for_array(arr, **kwargs):

 def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
                utc=None, box=True, format=None, exact=True,
-                unit=None, infer_datetime_format=False, origin='unix'):
+                unit=None, infer_datetime_format=False, origin='unix',
+                cache_datetime=False):


use_cache=True, change name and default to True

Why not call this cache=True?

jreback · 2017-07-26T10:34:48Z

pandas/core/tools/datetimes.py

@@ -257,6 +259,10 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,

        .. versionadded: 0.20.0

+    cache_datetime : boolean, default False
+        If True, use a cache of unique, converted dates to apply the datetime
+        conversion. Produces signficant speed-ups when parsing duplicate dates


see above, add versionadded tag

jreback · 2017-07-26T10:35:13Z

pandas/core/tools/datetimes.py

@@ -340,6 +346,19 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,

    tz = 'utc' if utc else None

+    cache = None


I think can be moved inside _convert_list_like (maybe)

jreback · 2017-07-26T10:36:05Z

pandas/core/tools/datetimes.py

@@ -340,6 +346,19 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,

    tz = 'utc' if utc else None

+    cache = None
+    if (cache_datetime and is_list_like(arg) and
+       not isinstance(arg, DatetimeIndex)):


when you write an asv (see below), this needs a min len arg (maybe 1000)

jreback · 2017-07-26T10:37:22Z

pandas/core/tools/datetimes.py

+        # No need to convert with a cache if the arg is already a DatetimeIndex
+        unique_dates = pd.unique(arg)
+        if len(unique_dates) != len(arg):
+            cache = {d: pd.to_datetime(d, errors=errors, dayfirst=dayfirst,


this is very inefficient, you are converting element by-element, simply

Series(to_datetime(unique_dates.....), index=unique_dates)

also avoids iterating over things

jreback · 2017-07-26T10:37:41Z

pandas/core/tools/datetimes.py

+            result = arg.map(cache)
+        else:
+            values = _convert_listlike(arg._values, False, format)
+            result = pd.Series(values, index=arg.index, name=arg.name)


leave the imports alone

jreback · 2017-07-26T10:38:02Z

pandas/core/tools/datetimes.py

@@ -1,5 +1,6 @@
 from datetime import datetime, timedelta, time
 import numpy as np
+import pandas as pd


use algorithms.unique below

jreback · 2017-07-26T10:39:23Z

pandas/tests/indexes/datetimes/test_tools.py

@@ -306,6 +306,45 @@ def test_to_datetime_tz_psycopg2(self):
                                    dtype='datetime64[ns, UTC]')
        tm.assert_index_equal(result, expected)

+    @pytest.mark.parametrize("box", [True, False])


this test is ok, but rather I would like to see a parametrize on pretty much every other test function in this file. that tests both use_cache=True/False

mroeschke · 2017-07-26T18:02:27Z

@gfyoung In regards to my 3rd point, I agree that the results should be the same whether cache_datetime=True/False. I am just curious whether pd.to_datetime(Series(...), utc=True) should return a dtype: datetime64[ns, UTC] in the first place.

In [9]: pd.__version__
Out[9]: u'0.20.3'

In [10]: test_dates = ['20130101 00:00:00'] * 10

In [11]: s = pd.Series(test_dates)

# Should this result have a datetime64[ns, UTC] dtype like Out [13]?
In [12]: pd.to_datetime(s, utc=True)
Out[12]:
0   2013-01-01
1   2013-01-01
2   2013-01-01
3   2013-01-01
4   2013-01-01
5   2013-01-01
6   2013-01-01
7   2013-01-01
8   2013-01-01
9   2013-01-01
dtype: datetime64[ns]

In [13]: pd.to_datetime(test_dates, utc=True)
Out[13]:
DatetimeIndex(['2013-01-01', '2013-01-01', '2013-01-01', '2013-01-01',
               '2013-01-01', '2013-01-01', '2013-01-01', '2013-01-01',
               '2013-01-01', '2013-01-01'],
              dtype='datetime64[ns, UTC]', freq=None)

# I think this is essentially the result of my caching implementation. 
In [14]: pd.Series([pd.Timestamp('20130101 00:00:00', tz='utc')]*10)
Out[14]:
0   2013-01-01 00:00:00+00:00
1   2013-01-01 00:00:00+00:00
2   2013-01-01 00:00:00+00:00
3   2013-01-01 00:00:00+00:00
4   2013-01-01 00:00:00+00:00
5   2013-01-01 00:00:00+00:00
6   2013-01-01 00:00:00+00:00
7   2013-01-01 00:00:00+00:00
8   2013-01-01 00:00:00+00:00
9   2013-01-01 00:00:00+00:00
dtype: datetime64[ns, UTC]

This discussion may be more appropriate in a separate issue if it is one. I may not hit this though once I refactor this implementation.

gfyoung · 2017-07-26T19:51:36Z

I am just curious whether pd.to_datetime(Series(...), utc=True) should return a dtype: datetime64[ns, UTC] in the first place.

I think it should. datetime64 is standard, and you specified utc=True.

jreback · 2017-07-26T19:56:20Z

there is another issue about utc=True

so it's out of scope for this PR

mroeschke · 2017-07-26T20:21:36Z

Thanks for confirming @jreback and @gfyoung. I will work on implementing your suggestions.

jreback · 2017-09-23T16:57:03Z

can you rebase / update

pep8speaks · 2017-09-26T01:15:30Z

Hello @mroeschke! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 11, 2017 at 18:39 Hours UTC

mroeschke · 2017-09-26T01:23:59Z

Here are the results from timeseries.ToDatetime ASVs.

>>>asv continuous -f 1.1 upstream/master fix_11665 -b timeseries.ToDatetime
      before           after         ratio
     [e0fe5cc6]       [72e99da0]
+     7.91±0.04ms           16.5ms     2.08  timeseries.ToDatetime.time_iso8601_format_no_sep
+     8.07±0.03ms      16.5±0.04ms     2.04  timeseries.ToDatetime.time_iso8601_nosep
+     8.18±0.01ms      16.4±0.03ms     2.00  timeseries.ToDatetime.time_iso8601
+     8.34±0.02ms      16.5±0.01ms     1.98  timeseries.ToDatetime.time_iso8601_format
+     12.5±0.07ms      14.8±0.05ms     1.18  timeseries.ToDatetime.time_format_YYYYMMDD
-        3.49±0ms         2.98±0ms     0.86  timeseries.ToDatetime.time_cache_with_dup_string_dates_and_format
-     3.54±0.01ms      2.98±0.01ms     0.84  timeseries.ToDatetime.time_cache_with_dup_string_dates
-     8.96±0.03ms      7.12±0.01ms     0.79  timeseries.ToDatetime.time_cache_with_dup_seconds_and_unit
-           2.11s       44.1±0.2ms     0.02  timeseries.ToDatetime.time_format_exact
-           2.02s      30.5±0.04ms     0.02  timeseries.ToDatetime.time_format_no_exact
-       407±0.3ms      3.35±0.01ms     0.01  timeseries.ToDatetime.time_cache_with_dup_string_tzoffset_dates

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

Addtionally, I edited most all the tests in pandas/tests/indexes/datetimes/test_tools.py to include @pytest.mark.parametrize on the cache keyword (True and False)

codecov · 2017-09-26T03:54:45Z

Codecov Report

Merging #17077 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17077      +/-   ##
==========================================
- Coverage   91.25%   91.22%   -0.04%     
==========================================
  Files         163      163              
  Lines       49810    49829      +19     
==========================================
- Hits        45456    45455       -1     
- Misses       4354     4374      +20

Flag	Coverage Δ
#multiple	`89.02% <100%> (-0.02%)`	⬇️
#single	`40.32% <50%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/datetimes.py	`95.53% <ø> (ø)`	⬆️
pandas/core/tools/datetimes.py	`86.23% <100%> (+0.98%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/plotting/_converter.py	`63.38% <0%> (-1.82%)`	⬇️
pandas/core/frame.py	`97.73% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5279a17...75eccc5. Read the comment docs.

codecov · 2017-09-26T03:54:46Z

Codecov Report

Merging #17077 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17077      +/-   ##
==========================================
- Coverage   91.42%   91.39%   -0.04%     
==========================================
  Files         163      163              
  Lines       50064    50091      +27     
==========================================
+ Hits        45773    45779       +6     
- Misses       4291     4312      +21

Flag	Coverage Δ
#multiple	`89.2% <100%> (-0.02%)`	⬇️
#single	`40.36% <53.12%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/tools/datetimes.py	`84.48% <100%> (+1.51%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/plotting/_converter.py	`63.38% <0%> (-1.82%)`	⬇️
pandas/core/frame.py	`97.8% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3493aba...07fa22d. Read the comment docs.

jreback · 2017-09-26T13:07:33Z

pandas/core/tools/datetimes.py


+        .. versionadded: 0.20.2


jreback · 2017-09-26T13:10:22Z

pandas/core/tools/datetimes.py

+        if len(unique_dates) != len(arg):
+            from pandas import Series
+            cache_dates = _convert_listlike(unique_dates, False, format)
+            convert_cache = Series(cache_dates, index=unique_dates)


so its better to actually to make convert_cache a function, which can then take the unique_dates and return the converted data, avoids lots of code duplication.

jreback · 2017-09-26T13:12:39Z

pandas/core/indexes/datetimes.py

@@ -334,7 +334,7 @@ def __new__(cls, data=None,
        if not (is_datetime64_dtype(data) or is_datetimetz(data) or
                is_integer_dtype(data)):
            data = tools.to_datetime(data, dayfirst=dayfirst,
-                                     yearfirst=yearfirst)
+                                     yearfirst=yearfirst, cache=False)


reason for this chanege?

This is in order to prevent a RuntimeError due to recursion.

The cache is built using _convert_listlike which can return a DatetimeIndex, then the DatetimeIndex constructor can call to_datetime which goes back to _convert_listlike...

mroeschke · 2017-10-05T04:10:03Z

Tests are passing now and here are the latest asv results:

>>> asv continuous -f 1.1 upstream/master fix_11665 -b timeseries.ToDatetime
 before           after         ratio
     [37860a5f]       [bdff633d]
+     8.05±0.05ms      16.3±0.09ms     2.02  timeseries.ToDatetime.time_iso8601
+     7.99±0.06ms      15.8±0.09ms     1.98  timeseries.ToDatetime.time_iso8601_nosep
+     8.53±0.06ms       16.5±0.3ms     1.94  timeseries.ToDatetime.time_iso8601_format_no_sep
+     8.72±0.02ms      16.0±0.04ms     1.84  timeseries.ToDatetime.time_iso8601_format
+      12.5±0.1ms      14.6±0.08ms     1.17  timeseries.ToDatetime.time_format_YYYYMMDD
-        3.53±0ms         3.14±0ms     0.89  timeseries.ToDatetime.time_cache_with_dup_string_dates_and_format
-     8.18±0.03ms         7.25±0ms     0.89  timeseries.ToDatetime.time_cache_with_dup_seconds_and_unit
-           2.07s      47.7±0.02ms     0.02  timeseries.ToDatetime.time_format_exact
-           1.96s      31.7±0.02ms     0.02  timeseries.ToDatetime.time_format_no_exact
-       399±0.5ms      3.51±0.01ms     0.01  timeseries.ToDatetime.time_cache_with_dup_string_tzoffset_dates

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

jreback · 2017-10-05T10:17:05Z

pandas/core/tools/datetimes.py

@@ -111,7 +112,11 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
          origin.

        .. versionadded: 0.20.0
+    cache : boolean, default False


default is True

jreback · 2017-10-05T10:17:21Z

pandas/core/tools/datetimes.py

@@ -305,6 +310,29 @@ def _convert_listlike(arg, box, format, name=None, tz=tz):
            except (ValueError, TypeError):
                raise e

+    def _maybe_convert_cache(arg, cache, tz):


use a proper doc-string here

jreback · 2017-10-05T10:19:10Z

pandas/core/tools/datetimes.py

+        result = _maybe_convert_cache(arg, cache, tz)
+        if result is None:
+            result = _convert_listlike(arg, box, format, name=arg.name)
+        else:


why can't you handle these caess list-list/index) inside _maybe_convert_cache? (I am talking about the else/box part.

jreback · 2017-10-05T10:19:58Z

what is the 2x slowdown on some of the existing tests?

chris-b1 · 2017-10-05T13:39:58Z

It looks like the iso8601 path is so fast the caching always hurts - I think that makes sense, that conversion shouldn't be much more expensive than hashing, but not sure how to handle it.

mroeschke · 2017-10-10T00:28:05Z

The 2x slowdown occurred with iso8601 strings dates without any duplicates. I've attached the profile below of the benchmark with the largest slowdown. It looks like it's expensive put the strings (objects) through algorithms.unique and then continue down the regular conversion path since there are no duplicates.

In [3]: rng = date_range(start='1/1/2000', periods=20000, freq='H')

In [4]: strings = rng.strftime('%Y-%m-%d %H:%M:%S').tolist()

In [5]: cProfile.run('to_datetime(strings)', sort='cumtime')
         273 function calls (270 primitive calls) in 0.017 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.017    0.017 <string>:1(<module>)
        1    0.001    0.001    0.017    0.017 datetimes.py:39(to_datetime)
        1    0.000    0.000    0.009    0.009 datetimes.py:207(_convert_listlike)
        1    0.008    0.008    0.008    0.008 {pandas._libs.tslib.array_to_datetime}
        1    0.000    0.000    0.007    0.007 datetimes.py:313(_maybe_convert_cache)
        1    0.000    0.000    0.007    0.007 algorithms.py:276(unique)
        1    0.005    0.005    0.005    0.005 {method 'unique' of 'pandas._libs.hashtable.PyObjectHashTable' objects}
        1    0.000    0.000    0.001    0.001 algorithms.py:164(_ensure_arraylike)
        2    0.001    0.001    0.001    0.001 {pandas._libs.lib.infer_dtype}
        1    0.000    0.000    0.001    0.001 algorithms.py:132(_reconstruct_data)
        2    0.001    0.000    0.001    0.000 {numpy.core.multiarray.array}
        1    0.000    0.000    0.001    0.001 algorithms.py:189(_get_hashtable_algo)
        1    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
      114    0.000    0.000    0.000    0.000 {isinstance}
    11/10    0.000    0.000    0.000    0.000 common.py:1773(_get_dtype_type)
        1    0.000    0.000    0.000    0.000 {pandas._libs.lib.list_to_object_array}
      2/1    0.000    0.000    0.000    0.000 _decorators.py:86(wrapper)
        1    0.000    0.000    0.000    0.000 datetimes.py:269(__new__)
        3    0.000    0.000    0.000    0.000 common.py:1545(is_bool_dtype)
        1    0.000    0.000    0.000    0.000 algorithms.py:39(_ensure_data)
        6    0.000    0.000    0.000    0.000 dtypes.py:85(is_dtype)
        3    0.000    0.000    0.000    0.000 common.py:334(is_datetime64tz_dtype)
        3    0.000    0.000    0.000    0.000 common.py:297(is_datetime64_dtype)
        1    0.000    0.000    0.000    0.000 datetimes.py:579(_simple_new)
       12    0.000    0.000    0.000    0.000 generic.py:7(_check)
        3    0.000    0.000    0.000    0.000 common.py:478(is_categorical_dtype)
        1    0.000    0.000    0.000    0.000 common.py:1049(is_datetime64_ns_dtype)
        2    0.000    0.000    0.000    0.000 dtypes.py:428(construct_from_string)
        1    0.000    0.000    0.000    0.000 common.py:1722(_get_dtype)
        3    0.000    0.000    0.000    0.000 common.py:85(is_object_dtype)
       18    0.000    0.000    0.000    0.000 {getattr}
        2    0.000    0.000    0.000    0.000 common.py:409(is_period_dtype)
        3    0.000    0.000    0.000    0.000 abc.py:128(__instancecheck__)
        2    0.000    0.000    0.000    0.000 inference.py:234(is_list_like)
        2    0.000    0.000    0.000    0.000 dtypes.py:370(__new__)
        2    0.000    0.000    0.000    0.000 common.py:873(is_unsigned_integer_dtype)
        2    0.000    0.000    0.000    0.000 dtypes.py:554(is_dtype)
        2    0.000    0.000    0.000    0.000 common.py:824(is_signed_integer_dtype)
        2    0.000    0.000    0.000    0.000 common.py:1493(is_float_dtype)
        8    0.000    0.000    0.000    0.000 {hasattr}
       11    0.000    0.000    0.000    0.000 {issubclass}
      6/5    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 common.py:442(is_interval_dtype)
        1    0.000    0.000    0.000    0.000 dtypes.py:676(is_dtype)
        4    0.000    0.000    0.000    0.000 _weakrefset.py:70(__contains__)
        5    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
        1    0.000    0.000    0.000    0.000 numeric.py:463(asarray)
        2    0.000    0.000    0.000    0.000 dtypes.py:270(construct_from_string)
        2    0.000    0.000    0.000    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
        4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
        2    0.000    0.000    0.000    0.000 {pandas._libs.algos.ensure_object}
        1    0.000    0.000    0.000    0.000 base.py:558(__len__)
        1    0.000    0.000    0.000    0.000 base.py:552(_reset_identity)
        1    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x7f88fb67a4c0}
        1    0.000    0.000    0.000    0.000 {pandas._libs.tslibs.timezones.maybe_get_tz}
        1    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        1    0.000    0.000    0.000    0.000 base.py:467(_deepcopy_if_needed)
        1    0.000    0.000    0.000    0.000 frequencies.py:391(to_offset)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

jorisvandenbossche · 2017-10-13T09:10:15Z

Given the potential slowdown, can't we add it as optional keyword? (so False by default)

jreback · 2017-11-11T21:01:02Z

thanks @mroeschke nice PR!

look forward to cache='infer' !

pls open an issue for this.

…#17077)

gfyoung added Enhancement API Design Datetime Datetime data dtype labels Jul 26, 2017

jreback requested changes Jul 26, 2017

View reviewed changes

mroeschke mentioned this pull request Jul 29, 2017

BUG: Allow pd.unique to accept tuple of strings #17108

Merged

3 tasks

mroeschke force-pushed the fix_11665 branch from 130406e to 72e99da Compare September 26, 2017 01:15

jreback requested changes Sep 26, 2017

View reviewed changes

mroeschke force-pushed the fix_11665 branch 4 times, most recently from 0d5ac41 to 68c7e9f Compare October 4, 2017 02:22

mroeschke changed the title ~~[WIP] PERF: Add cache_datetime keyword to to_datetime (#11665)~~ PERF: Add cache keyword to to_datetime (#11665) Oct 5, 2017

jreback requested changes Oct 5, 2017

View reviewed changes

mroeschke force-pushed the fix_11665 branch from 68c7e9f to de604a6 Compare October 10, 2017 00:12

mroeschke added 21 commits November 11, 2017 10:34

Some performance testing

b5e71d2

Add asvs, modify tests for caches

fb2e831

Fix asv errors and condition

33c79d3

Pep8 fixes

dcaafb6

Remove unused import

04df9d9

Wrap cache logic in a function

34b468f

Fix Series test

d287cc6

Add whatsnew and small documentation fix

1bf4c9d

pep 8 fixes

3ffdd46

Move box logic into maybe_convert_cache

a093b88

Use quicker unique check

d1fc211

Move caching function outside to_datetime

9486df3

Pass most tests

d059d44

Skip test related to GH 18111, lint

02ab4f3

Update docstring

82f36d3

adjust imports, docs and move whatsnew

76547e1

Remove whitespace

590c9cc

Address comments

9a985ac

Lint fix

85a1f2d

Move docs and adjust test

49f5850

Lint

07fa22d

mroeschke force-pushed the fix_11665 branch from 593c1cc to 07fa22d Compare November 11, 2017 18:39

jreback added this to the 0.22.0 milestone Nov 11, 2017

jreback approved these changes Nov 11, 2017

View reviewed changes

jreback merged commit b36dab5 into pandas-dev:master Nov 11, 2017

mroeschke mentioned this pull request Nov 13, 2017

ENH/PERF: Add cache='infer' to to_datetime #18255

Closed

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

PERF: Add cache keyword to to_datetime (pandas-dev#11665) (pandas-dev…

2267b97

…#17077)

mroeschke deleted the fix_11665 branch December 20, 2017 02:04

jbrockmendel mentioned this pull request Mar 15, 2023

BUG/API: preserve non-nano in factorize/unique #51978

Merged

5 tasks

		@@ -340,6 +346,19 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,

		tz = 'utc' if utc else None

		cache = None

PERF: Add cache keyword to to_datetime (#11665) #17077

PERF: Add cache keyword to to_datetime (#11665) #17077

Conversation

mroeschke commented Jul 26, 2017 • edited Loading

gfyoung commented Jul 26, 2017 • edited Loading

gfyoung commented Jul 26, 2017 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mroeschke commented Jul 26, 2017 • edited Loading

gfyoung commented Jul 26, 2017

jreback commented Jul 26, 2017

mroeschke commented Jul 26, 2017

jreback commented Sep 23, 2017

pep8speaks commented Sep 26, 2017 • edited Loading

Comment last updated on November 11, 2017 at 18:39 Hours UTC

mroeschke commented Sep 26, 2017 • edited Loading

codecov bot commented Sep 26, 2017

Codecov Report

codecov bot commented Sep 26, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mroeschke commented Oct 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 5, 2017

chris-b1 commented Oct 5, 2017

mroeschke commented Oct 10, 2017

jorisvandenbossche commented Oct 13, 2017

jreback commented Nov 11, 2017

mroeschke commented Jul 26, 2017 •

edited

Loading

gfyoung commented Jul 26, 2017 •

edited

Loading

gfyoung commented Jul 26, 2017 •

edited

Loading

mroeschke commented Jul 26, 2017 •

edited

Loading

pep8speaks commented Sep 26, 2017 •

edited

Loading

mroeschke commented Sep 26, 2017 •

edited

Loading

codecov bot commented Sep 26, 2017 •

edited

Loading