Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: PeriodIndex and Period subtraction error #13071

Closed
wants to merge 1 commit into from

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented May 3, 2016

  • tests added / passed
  • passes git diff upstream/master | flake8 --diff
  • whatsnew entry

Similar to #5202. PeriodIndex subtraction raises AttributeError either side is Period (scalar).

pd.PeriodIndex(['2011-01', '2011-02'], freq='M') - pd.Period('2011-01', freq='M') 
# AttributeError: 'PeriodIndex' object has no attribute 'ordinal'

Expected

If Period(NaT) is included in either side, result is Float64Index to hold nan.

pd.PeriodIndex(['2011-01', '2011-02'], freq='M') - pd.Period('2011-01', freq='M') 
# Int64Index([0, 1], dtype='int64')

pd.PeriodIndex(['2011-01', 'NaT'], freq='M') - pd.Period('2011-01', freq='M') 
Float64Index([0.0, nan], dtype='float64')

@sinhrks sinhrks added Bug Numeric Operations Arithmetic, Comparison, and Logical operations Period Period data type labels May 3, 2016
@sinhrks sinhrks added this to the 0.18.2 milestone May 3, 2016
@sinhrks sinhrks force-pushed the period_period_sub branch from faa7b61 to 0627d39 Compare May 3, 2016 17:51
@jreback
Copy link
Contributor

jreback commented May 3, 2016

hmm, shouldn't subtraction yield a TimedeltaIndex?

@sinhrks
Copy link
Member Author

sinhrks commented May 3, 2016

Yes returning TimedeltaIndex may be natural. It's based on the current Period behavior.

pd.Period('2011-03', freq='M') - pd.Period('2011-01', freq='M') 
# 2

@sinhrks
Copy link
Member Author

sinhrks commented May 3, 2016

Also found DatetimeIndex - Period raises below errors on current master. Are these should be supported, or raise more understandable errors?

pd.DatetimeIndex(['2011-01-01', '2011-02-01']) - pd.Period('2011-01-01', freq='D') 
# ValueError: Cannot do arithmetic with non-conforming periods

pd.DatetimeIndex(['2011-01-01', '2011-01-02'], freq='D') - pd.Period('2011-01-01', freq='D') 
# AttributeError: 'DatetimeIndex' object has no attribute 'ordinal'

@jreback
Copy link
Contributor

jreback commented May 3, 2016

hmm, actually I think

pd.Period('2011-03', freq='M') - pd.Period('2011-01', freq='M')

should be a 2M Span, which we don't really have the concept of. Timedelta point-in-time.

cc @MaximilianR

@jreback
Copy link
Contributor

jreback commented May 3, 2016

I think your first example is legit (you can't do arithmetic between spans and points) but the 2nd should work

@max-sixty
Copy link
Contributor

If we think about a Period as a interval / span of time, I think it's reasonable that equal freq Periods can do arithmetic.
This can be extended to intervals generally: [5->7] - [2->4] == 3.

While Period arithmetic currently return ints, Timedeltas would be ideal (although ints are 'ok', it's pretty clear what they mean)

So I think this should probably be a 2M Timedelta:
pd.Period('2011-03', freq='M') - pd.Period('2011-01', freq='M')

@jreback do you think differently?

@jreback
Copy link
Contributor

jreback commented May 3, 2016

@MaximilianR

Timedelta don't really support '2 months' as these are only days,h,m,s, etc.

we need a Perioddelta

@max-sixty
Copy link
Contributor

Timedelta don't really support '2 months' as these are only days,h,m,s, etc.

I see.

But for me it's the same concept, just with different units. So [5->7] - [2->4] == 3, whether with days, months, or ounces.

@jreback
Copy link
Contributor

jreback commented May 3, 2016

so u end up losing freq info here
and 2 months is itself ambiguous - we could return an offset

@max-sixty
Copy link
Contributor

we could return an offset

Yes, I think this is absolutely right.

and 2 months is itself ambiguous

Why?

@jreback
Copy link
Contributor

jreback commented May 3, 2016

@MaximilianR 2M doesn't have a fixed conversion because its a relative offset, where as a Timedelta of 60 days is absolute.

So you leave something on the table.

PeriodDelta is the right soln, but maybe we can just tag this onto Timedelta, IOW have it support 2 Months

@max-sixty
Copy link
Contributor

I don't know the intricacies of DateOffset / relativedelta, but relativedelta seems to do this well, at the conceptual level?

In [15]: from dateutil import relativedelta
In [18]: relativedelta.relativedelta(months=2)
Out[18]: relativedelta(months=+2)
In [22]: relativedelta.relativedelta(months=2) + datetime.datetime(2005,5,3)
Out[22]: datetime.datetime(2005, 7, 3, 0, 0)

@jreback
Copy link
Contributor

jreback commented May 3, 2016

relativedelta is equiv to an offset (though more fully featured, the offsets I mean)

@sinhrks
Copy link
Member Author

sinhrks commented May 3, 2016

How about split to 2 separated issues from this:

  1. API change for Period - Period (for 0.19) (ENH/API: Decide what to return Period - Period subtraction #13077)
  2. DatetimeIndex/Timestamp - PeriodIndex/Period and vice versa. (BUG/ERR: DatetimeIndex - Period shows ununderstandable error #13078)

@jreback
Copy link
Contributor

jreback commented May 3, 2016

ok on fixing what is broken here. Yes, let's open a new issue to discuss 1)

msg = _DIFFERENT_FREQ_INDEX.format(self.freqstr, other.freqstr)
raise IncompatibleFrequency(msg)

if other.ordinal == tslib.iNaT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this always a Period? see my comment above

Copy link
Member Author

@sinhrks sinhrks May 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always Period. Defined in tseries/base.py.

@sinhrks sinhrks force-pushed the period_period_sub branch from 0627d39 to 0f0951f Compare May 6, 2016 02:49
@jreback jreback closed this in 2cd1480 May 6, 2016
@sinhrks sinhrks deleted the period_period_sub branch May 6, 2016 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants