BUG: reset_index of level on a MultiIndex with NaT converts to np.nan #11479

emsems · 2015-10-30T08:06:16Z

Not sure if it's know already, but couldn't find any open issues.
It's similar to this closed one:
#10388

using pandas 0.17 and numpy 1.10.1

code to reproduce the issues:

import pandas as pd
from pandas import DataFrame
import numpy as np

idx = np.arange(0, 10)  # could have an NaN?
tstamp = pd.date_range('201507010000', freq='h', periods=10).values
df = DataFrame({'id': idx, 'tstamp': tstamp, 'a': list('abcdefghij')})
df.loc[3, 'tstamp'] = pd.NaT

print 'without timezone:'
try:
    a = df.set_index(['id', 'tstamp']).reset_index('tstamp')
    print 'a works'
except Exception as e:
    print 'a fails: %s: %s' %(e.__class__.__name__, e)
try:
    b = df.set_index(['id', 'tstamp']).reset_index('tstamp').reset_index('id')
    print 'b works'
except Exception as e:
    print 'b fails: %s: %s' %(e.__class__.__name__, e)
try:
    c =  df.set_index(['id', 'tstamp']).reset_index()
    print 'c works'
except Exception as e:
    print 'c fails: %s: %s' %(e.__class__.__name__, e)
try:
    d =  df.set_index(['id', 'tstamp']).reset_index('id')
    print 'd works'
except Exception as e:
    print 'd fails: %s: %s' %(e.__class__.__name__, e)
print 'with timezone:'
df['tstamp'] = pd.DatetimeIndex(df['tstamp']).tz_localize('Europe/Berlin')
try:
    a = df.set_index(['id', 'tstamp']).reset_index('tstamp')
    print 'a works'
except Exception as e:
    print 'a fails: %s: %s' %(e.__class__.__name__, e)
try:
    b = df.set_index(['id', 'tstamp']).reset_index('tstamp').reset_index('id')
    print 'b works'
except Exception as e:
    print 'b fails: %s: %s' %(e.__class__.__name__, e)
try:
    c =  df.set_index(['id', 'tstamp']).reset_index()
    print 'c works'
except Exception as e:
    print 'c fails: %s: %s' %(e.__class__.__name__, e)
try:
    d =  df.set_index(['id', 'tstamp']).reset_index('id')
    print 'd works'
except Exception as e:
    print 'd fails: %s: %s' %(e.__class__.__name__, e)

Output:
without timezone:
a works
b works
c works
d fails: ValueError: Could not convert object to NumPy datetime
with timezone:
a fails: TypeError: data type not understood
b fails: TypeError: data type not understood
c fails: TypeError: data type not understood
d fails: ValueError: Could not convert object to NumPy datetime

Can you suggest any workaround?

The text was updated successfully, but these errors were encountered:

jreback · 2015-10-30T12:37:25Z

a lot of these are actually fixed by #11343 as we will be catching these errors in putmask.
cc @sinhrks

all of that said, you generallly do not want NaN's of any kind in the multiindex it makes your index non-performant and simply hard to work with w.r.t. indexing.

what are you actually trying to do?

emsems · 2015-11-02T07:09:40Z

Hi, thanks for picking this up!

Yeah, I understand it’s not ideal to have NaN in the index. However I wrote a class (actually a number of classes) to do my data evaluation, which need a DataFrame of a defined structure in it’s .data property. Datapoints always have an index and mostly a timestamp as well. The index is unique, the timestamp not always, but mostly. Therefore, I figured it would be generally good to have them both in a MultiIndex (because if there is timestamp information, I usually use this to select data).
In some evaluations I prefer for simplicity to have only one oft he two in the index, therefore the drop_index, which is used internally in a method of one of my classes.
I hope this makes some sense without all the context...

mroeschke · 2019-10-21T00:06:52Z

These example all work on master. Could use tests:

In [189]: import pandas as pd
     ...: from pandas import DataFrame
     ...: import numpy as np
     ...:
     ...: idx = np.arange(0, 10)  # could have an NaN?
     ...: tstamp = pd.date_range('201507010000', freq='h', periods=10).values
     ...: df = DataFrame({'id': idx, 'tstamp': tstamp, 'a': list('abcdefghij')})
     ...: df.loc[3, 'tstamp'] = pd.NaT

In [190]: a = df.set_index(['id', 'tstamp']).reset_index('tstamp')

In [191]: b = df.set_index(['id', 'tstamp']).reset_index('tstamp').reset_index('id')

In [192]: c =  df.set_index(['id', 'tstamp']).reset_index()

In [194]: d =  df.set_index(['id', 'tstamp']).reset_index('id')

In [195]: df['tstamp'] = pd.DatetimeIndex(df['tstamp']).tz_localize('Europe/Berlin')

In [196]: a = df.set_index(['id', 'tstamp']).reset_index('tstamp')

In [197]: b = df.set_index(['id', 'tstamp']).reset_index('tstamp').reset_index('id')

In [198]: c =  df.set_index(['id', 'tstamp']).reset_index()

In [199]: d =  df.set_index(['id', 'tstamp']).reset_index('id')

In [200]: pd.__version__
Out[200]: '0.26.0.dev0+593.g9d45934af'

mroeschke · 2020-01-04T04:52:35Z

Actually the last cases are not fixed yet as the NaT gets transformed to nan

In [24]: d
Out[24]:
                           id  a
tstamp
2015-07-01 00:00:00+02:00   0  a
2015-07-01 01:00:00+02:00   1  b
2015-07-01 02:00:00+02:00   2  c
NaN                         3  d
2015-07-01 04:00:00+02:00   4  e
2015-07-01 05:00:00+02:00   5  f
2015-07-01 06:00:00+02:00   6  g
2015-07-01 07:00:00+02:00   7  h
2015-07-01 08:00:00+02:00   8  i
2015-07-01 09:00:00+02:00   9  j

mroeschke · 2021-04-20T06:00:57Z

The last case appears to work now

In [8]: In [189]: import pandas as pd
   ...:      ...: from pandas import DataFrame
   ...:      ...: import numpy as np
   ...:      ...:
   ...:      ...: idx = np.arange(0, 10)  # could have an NaN?
   ...:      ...: tstamp = pd.date_range('201507010000', freq='h', periods=10).values
   ...:      ...: df = DataFrame({'id': idx, 'tstamp': tstamp, 'a': list('abcdefghij')})
   ...:      ...: df.loc[3, 'tstamp'] = pd.NaT

In [9]: df.set_index(['id', 'tstamp']).reset_index('id')
Out[9]:
                     id  a
tstamp
2015-07-01 00:00:00   0  a
2015-07-01 01:00:00   1  b
2015-07-01 02:00:00   2  c
NaT                   3  d
2015-07-01 04:00:00   4  e
2015-07-01 05:00:00   5  f
2015-07-01 06:00:00   6  g
2015-07-01 07:00:00   7  h
2015-07-01 08:00:00   8  i
2015-07-01 09:00:00   9  j

In [10]: pd.__version__
Out[10]: '1.3.0.dev0+1368.gccb90d60c6'

jreback added Bug Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex Difficulty Intermediate labels Oct 30, 2015

jreback added this to the Next Major Release milestone Oct 30, 2015

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Difficulty Intermediate Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex labels Oct 21, 2019

mroeschke added MultiIndex Datetime Datetime data dtype and removed Needs Tests Unit test(s) needed to prevent regressions good first issue labels Jan 4, 2020

mroeschke mentioned this issue Jan 4, 2020

TST: Add more tests for fixed issues #30674

Merged

7 tasks

mroeschke changed the title ~~Various Exceptions with NaT in MultiIndex and reset_index()~~ BUG: reset_index of level on a MultiIndex with NaT converts to np.nan Mar 31, 2020

mroeschke added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Mar 31, 2020

mroeschke removed Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Apr 20, 2021

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed MultiIndex Datetime Datetime data dtype labels Apr 20, 2021

mroeschke mentioned this issue May 9, 2021

TST: Add regression tests for old issues #41389

Merged

10 tasks

simonjayhawkins modified the milestones: Contributions Welcome, 1.3 May 9, 2021

jreback closed this as completed in #41389 May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: reset_index of level on a MultiIndex with NaT converts to np.nan #11479

BUG: reset_index of level on a MultiIndex with NaT converts to np.nan #11479

emsems commented Oct 30, 2015

jreback commented Oct 30, 2015

emsems commented Nov 2, 2015

mroeschke commented Oct 21, 2019

mroeschke commented Jan 4, 2020

mroeschke commented Apr 20, 2021

BUG: reset_index of level on a MultiIndex with NaT converts to np.nan #11479

BUG: reset_index of level on a MultiIndex with NaT converts to np.nan #11479

Comments

emsems commented Oct 30, 2015

jreback commented Oct 30, 2015

emsems commented Nov 2, 2015

mroeschke commented Oct 21, 2019

mroeschke commented Jan 4, 2020

mroeschke commented Apr 20, 2021