Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISC: Period.__le__ #17692

Closed
jbrockmendel opened this issue Sep 27, 2017 · 5 comments
Closed

DISC: Period.__le__ #17692

jbrockmendel opened this issue Sep 27, 2017 · 5 comments
Labels
API Design Period Period data type

Comments

@jbrockmendel
Copy link
Member

There are a bunch of issues outstanding that relate to Period ops and comparison. AFAICT making a decision about the comparison issue will snowball into resolving (some of) the ops issues.

#5202 ENH: Period ops NaT & timedelta ops
#10798 Date / Datetime in Period Index
#6779 Adding Period and Offset not implemented
#13077 ENH/API: Decide what to return Period - Period subtraction
#17112 MultiIndex - Comparison with Mixed Frequencies (and other FUBAR)

Right now two Period objects are comparable iff they equal freq attributes. AFAICT this is to avoid guessing in cases where the "correct" answer is ambiguous. But there are some other cases with an obviously correct answer. Namely, if the per1.end_time < per2.start_time, then it should be the case that per1 < per2 unambiguously. This intuition also extends to datetime and Timestamp objects that do not lie between per.start_time and per.end_time.

For cases with overlap there are a couple of reasonable approaches. My preferred approach is lexicographic: first compare per1.start_time with per2.start_time. If they are equal, then compare the end_times. Then we treat datetime and Timestamp objects as analogous to zero-duration periods.

Thoughts?

@shoyer
Copy link
Member

shoyer commented Sep 27, 2017

Another point of reference is Interval, which uses lexicographic ordering on (start, stop, closed):

if _interval_like(other):
self_tuple = (self.left, self.right, self.closed)
other_tuple = (other.left, other.right, other.closed)
return PyObject_RichCompare(self_tuple, other_tuple, op)

So if we care about ordering periods with different frequencies, this would be a reasonable choice. On the other hand, if we don't have use cases for this, from a usability perspective it is probably better to stick with refusing to order potentially overlapping periods.

Then we treat datetime and Timestamp objects as analogous to zero-duration periods.

So period < timestamp would effectively compare period.start_time < timestamp? I suppose we could argue that this is consistent, but here I really don't think it's a good idea. It's better to force users to be explicit about what they want rather than allow them to make mistaken assumptions (period.end_time < timestamp is at least as plausible).

@jreback
Copy link
Contributor

jreback commented Sep 27, 2017

yeah in an ideal work we would actually use an Interval to back a Period. Yes we should clarify the semantics here.

@jreback jreback added API Design Period Period data type labels Sep 27, 2017
@jbrockmendel
Copy link
Member Author

So period < timestamp would effectively compare period.start_time < timestamp? I suppose we could argue that this is consistent, but here I really don't think it's a good idea

Yah, it's sub-optimal. The question becomes if we want to implement the unambiguous cases and continue raising for ambiguous ones vs leave existing behavior. Even for cases with overlap __eq__ and __ne__ should usually be unambiguous.

On the other hand, if we don't have use cases for this, from a usability perspective it is probably better to stick with refusing to order potentially overlapping periods.

From #17112, set_index uses comparisons under the hood:

index = pd.Index(['PCE']*2, name='Variable')
data = [
	pd.Period('2018Q1'),
	pd.Period('2021', freq='A'),
	]
ser = pd.Series(data, index=index, name='Period')
ser.to_frame().set_index('Period', append=True)

In py2 raises NotImplementedError: > 1 ndim Categorical are not supported at this time and in py3 raises SystemError: <built-in function isinstance> returned a result with an error set

@jbrockmendel
Copy link
Member Author

Doesn't look like there's a lot of enthusiasm for this. One more shot with a less ambitious fix: just define __ne__ and __eq__ to evaluate unequal instead of raising.

@jbrockmendel
Copy link
Member Author

Lack of consensus, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Period Period data type
Projects
None yet
Development

No branches or pull requests

3 participants