Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: resample with PeriodIndex default span (start/end convention) #7744

Open
jorisvandenbossche opened this issue Jul 13, 2014 · 1 comment
Labels
Enhancement Period Period data type Resample resample method

Comments

@jorisvandenbossche
Copy link
Member

Previously, for resampling with PeriodIndex, you had two conventions: 'start' (start -> start) and 'end' (end -> end). This would give something like this (note: the following is not current real code output, but from Wes' book):

In [25]: s = pd.Series(np.arange(2), index=pd.period_range('2000-1', periods=2, freq='A'))

In [26]: s
Out[26]:
2000    0
2001    1
Freq: A-DEC, dtype: int32

In [27]: s.resample('Q-DEC', fill_method='ffill', convention='start')
Out[27]:
2000Q1    0
2000Q2    0
2000Q3    0
2000Q4    0
2001Q1    1
Freq: Q-DEC, dtype: int32

In [28]: s.resample('Q-DEC', fill_method='ffill', convention='end')
Out[27]:
2000Q4    0
2001Q1    1
2001Q2    1
2001Q3    1
2001Q4    1
Freq: Q-DEC, dtype: int32

Following Wes' book, the default argument was 'end'. However, the current behaviour is like this (this is real output):

In [27]: s.resample('Q-DEC', fill_method='ffill')
Out[27]:
2000Q1    0
2000Q2    0
2000Q3    0
2000Q4    0
2001Q1    1
2001Q2    1
2001Q3    1
2001Q4    1
Freq: Q-DEC, dtype: int32

So in fact this is a third option 'span' (start -> end). This option is mentioned in #1635, but from the issue it seems it was never implemented (the commit was never merged. There was a test added in comments at that time, but this is still in comments: https://github.com/pydata/pandas/blob/master/pandas/tseries/tests/test_resample.py#L1134).
In practice, however, this is the case (the default behaviour is this mentioned 'span' behaviour). But also the option 'start' has changed:

In [28]: s.resample('Q-DEC', fill_method='ffill', convention='start')
Out[28]:
2000Q1    0
2000Q2    0
2000Q3    0
2000Q4    0
2001Q1    1
2001Q2    1
2001Q3    1
2001Q4    1
Freq: Q-DEC, dtype: int32

This gives the same as the default (only for 'end' it is the same as before).

Some issues/questions:

  • what is the default value for convention? It is nowhere in the docs, and also not in the docstring (apart from the signature, which says 'start').
  • I don't find the issue/PR/release note where it says that the default for period resample (upsampling) has changed
  • the default now is a 'spanning' behaviour, but this is the same as 'start'. Shouldn't be this something else? So that the 'start' option has another behaviour (start -> start) than the default spanning behaviour ('start' -> 'end')?
@jorisvandenbossche
Copy link
Member Author

Note: I found the commit: 022e630#diff-0270959f3dd5fb3637134fc695ecd521R33 (at least the one that mentions this in the release note)

@jreback jreback added this to the 0.15.0 milestone Jul 14, 2014
@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 9, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback jreback added the Resample resample method label Apr 11, 2016
@datapythonista datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018
@mroeschke mroeschke removed this from the Someday milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Period Period data type Resample resample method
Projects
None yet
Development

No branches or pull requests

4 participants