Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues Creating Period with DateOffset as freq #4878

Closed
cancan101 opened this issue Sep 19, 2013 · 19 comments
Closed

Issues Creating Period with DateOffset as freq #4878

cancan101 opened this issue Sep 19, 2013 · 19 comments
Labels
Bug Frequency DateOffsets Period Period data type

Comments

@cancan101
Copy link
Contributor

cancan101 commented Sep 19, 2013

Something seems wrong here:

print repr(pd.Period("2013-12", freq=pd.offsets.MonthEnd()))

correctly prints:

Period('2013-12', 'M')

but:

print pd.Period("2013-12", freq=pd.offsets.BusinessMonthEnd())

leads to:

Traceback (most recent call last):
  File "/home/alex/git/pandas/pandas/tseries/tests/test_52.py", line 29, in testName
    a = pd.Period("2013-12", freq=BusinessMonthEnd())
  File "/home/alex/git/pandas/pandas/tseries/period.py", line 121, in __init__
    base, mult = _gfc(freq)
  File "/home/alex/git/pandas/pandas/tseries/frequencies.py", line 92, in get_freq_code
    code = _period_str_to_code(freqstr[0])
  File "/home/alex/git/pandas/pandas/tseries/frequencies.py", line 757, in _period_str_to_code
    alias = _period_alias_dict[freqstr]
KeyError: 'BM'

Strangely, the following both work:

In [10]: pd.date_range("2013-01","2013-10",freq=pd.offsets.MonthEnd())
Out[10]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-31 00:00:00, ..., 2013-09-30 00:00:00]
Length: 9, Freq: M, Timezone: None

In [11]: pd.date_range("2013-01","2013-10",freq=pd.offsets.BusinessMonthEnd())
Out[11]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-31 00:00:00, ..., 2013-09-30 00:00:00]
Length: 9, Freq: BM, Timezone: None
@cancan101
Copy link
Contributor Author

The exception actually looks like:

Traceback (most recent call last)
<ipython-input-19-fe5191046ba2> in <module>()
----> 1 print pd.Period("2013-12", freq=pd.offsets.BusinessMonthEnd())

/usr/local/lib/python2.7/dist-packages/pandas/tseries/period.pyc in __init__(self, value, freq, ordinal, year, month, quarter, day, hour, minute, second)
    118             raise ValueError(msg)
    119 
--> 120         base, mult = _gfc(freq)
    121         if mult != 1:
    122             raise ValueError('Only mult == 1 supported')

/usr/local/lib/python2.7/dist-packages/pandas/tseries/frequencies.pyc in get_freq_code(freqstr)
     91                 stride = freqstr[1]
     92             except:
---> 93                 code = _period_str_to_code(freqstr[1])
     94                 stride = freqstr[0]
     95             return code, stride

/usr/local/lib/python2.7/dist-packages/pandas/tseries/frequencies.pyc in _period_str_to_code(freqstr)
    738     # hack
    739     freqstr = _rule_aliases.get(freqstr, freqstr)
--> 740     freqstr = _rule_aliases.get(freqstr.lower(), freqstr)
    741 
    742     try:

AttributeError: 'int' object has no attribute 'lower'

The more informative stack trace in the OP comes from making the following change: cancan101@6789ef8#L1R95

@cancan101
Copy link
Contributor Author

Some more examples of inconsistencies:

pd.period_range('1/1/2012', periods=4, freq=pd.offsets.BusinessMonthEnd())

yields:

/usr/local/lib/python2.7/dist-packages/pandas/tseries/frequencies.pyc in get_freq_code(freqstr)
     99 
    100     base, stride = _base_and_stride(freqstr)
--> 101     code = _period_str_to_code(base)
    102 
    103     return code, stride

/usr/local/lib/python2.7/dist-packages/pandas/tseries/frequencies.pyc in _period_str_to_code(freqstr)
    744         return _period_code_map[freqstr]
    745     except:
--> 746         alias = _period_alias_dict[freqstr]
    747         return _period_code_map[alias]
    748 

KeyError: 'BM'

whereas:

pd.date_range('1/1/2012', periods=4, freq=pd.offsets.BusinessMonthEnd()).to_period()

works correctly.

I have tracked part of this issue down:
period_range uses _period_str_to_code which in turn uses _period_code_map (or _period_alias_dict). Currently 'BM' is not in these dicts.
to_period on the other hand uses get_period_alias which in turn uses _offset_to_period_map.

@cancan101
Copy link
Contributor Author

So this could be a simple fix, but I want to get some feedback:

I plan to add this test:

    def test_period_range_alias(self):
        self.assertTrue(
            pd.date_range('1/1/2012', periods=4, freq=pd.offsets.MonthEnd()).to_period().identical(
            pd.period_range('1/1/2012', periods=4, freq=pd.offsets.MonthEnd())))

        #GH 4878
        self.assertTrue(
           pd.date_range('1/1/2012', periods=4, freq=pd.offsets.BusinessMonthEnd()).to_period().identical(
           pd.period_range('1/1/2012', periods=4, freq=pd.offsets.BusinessMonthEnd())))

Currently the first line passes, but the second fails.

To summarize the problem: Pandas uses DateOffset to represent fixed point in time but a string to represent time periods. Currently not all DateOffset have corresponding period strings. BusinessMonthEnd is an example of a DateOffset that does not have a corresponding period string. Currently when calling to_period on a DatetimeIndex, pandas will use the following:

def get_period_alias(offset_str):
    """ alias to closest period strings BQ->Q etc"""
    return _offset_to_period_map.get(offset_str, None)

for which there is in fact a mapping 'BM': 'M'. This same aliasing is NOT used when directly calling period_range for which the correct period string must be specified. If a DateOffset is passed into to period_range, Pandas uses the rulecode for the DateOffset which for BusinessMonthEnd is BM which, as stated above, is not a valid period string.

One idea would be for DateOffset to have a periodstr property. Another would be if the user passes in an DateOffset to period_range to pass its rulecode through get_period_alias.

Thoughts?

@jtratner
Copy link
Contributor

jtratner commented Oct 7, 2013

@cancan101 weren't you thinking of reworking the backend for all of this? Would it make more sense to address this after you've made the changes there? I don't think there's any reason why we would not want your example to run without an error.

@cancan101
Copy link
Contributor Author

@jtratner #5148 is my WIP for this. So far the above unit test passes.

@jtratner
Copy link
Contributor

@cancan101 Now it has a new error - hurray:

Traceback (most recent call last):
  File "testcase2.py", line 5, in <module>
    pd.Period("2013-12", freq=BusinessMonthEnd())
  File "../pandas/tseries/period.py", line 122, in __init__
    base, mult = _gfc(freq)
  File "../pandas/tseries/frequencies.py", line 98, in get_freq_code
    code = _period_str_to_code(freqstr[1])
  File "../pandas/tseries/frequencies.py", line 619, in _period_str_to_code
    freqstr = _rule_aliases.get(freqstr.lower(), freqstr)
AttributeError: 'int' object has no attribute 'lower'

I guess I messed something up here.

@cancan101
Copy link
Contributor Author

@jtratner That issue (at least the poor exception raised) is fixed on my retail branch.

@jtratner
Copy link
Contributor

great - is that ready to go or soon to be ready? (not sure if you rebased it)

@cancan101
Copy link
Contributor Author

See
https://github.com/pydata/pandas/pull/5004/files#diff-18135e4852fd97f51aabf9d5be9f528dR99
for
reference. I am rebasing and dealing with conflicts as we speak.

On Mon, Oct 14, 2013 at 11:04 PM, Jeff Tratner [email protected]:

great - is that ready to go or soon to be ready? (not sure if you rebased
it)


Reply to this email directly or view it on GitHubhttps://github.com//issues/4878#issuecomment-26306283
.

@jreback
Copy link
Contributor

jreback commented Apr 3, 2014

This is with your PR; is this right?

In [10]: pd.period_range('1/1/2012', periods=4, freq=pd.offsets.BusinessMonthEnd())
Out[10]: 
<class 'pandas.tseries.period.PeriodIndex'>
freq: M
[2012-01, ..., 2012-04]
length: 4

In [11]: pd.period_range('1/1/2012', periods=4, freq=pd.offsets.MonthEnd())
Out[11]: 
<class 'pandas.tseries.period.PeriodIndex'>
freq: M
[2012-01, ..., 2012-04]
length: 4

@cancan101
Copy link
Contributor Author

Yes. Why?

@jreback
Copy link
Contributor

jreback commented Apr 3, 2014

should the first not be freq 'BM'?

@cancan101
Copy link
Contributor Author

I believe this is the "issue" I address here: https://github.com/pydata/pandas/pull/5148/files#diff-18135e4852fd97f51aabf9d5be9f528dR414

LMK if that does not make sense.

@jreback
Copy link
Contributor

jreback commented Apr 3, 2014

ok...so why are they not different?

@cancan101
Copy link
Contributor Author

Since this is a PeriodIndex, a monthly period is the same regardless of whether or not the period is defined by the start of month or end of month.

@jreback
Copy link
Contributor

jreback commented Apr 3, 2014

ok, but say then you want to union it with another index which is not a business freq, why would that not be a problem?

@cancan101
Copy link
Contributor Author

I actually misread your email. I thought you were saying begin vs end (not business vs regular month).

As for BuisnessDay, there never was a Period for Business month. This PR should not change that.

@jreback
Copy link
Contributor

jreback commented Apr 3, 2014

hmm...ok....so then a Period Business freq is essentially converted to a non-business freq then?

maybe put up a warning for that? (or even raise), doubt their are tests for that

@jreback jreback modified the milestones: 0.14.1, 0.14.0 Apr 27, 2014
@jreback jreback modified the milestones: 0.15.0, 0.14.1 Jun 10, 2014
@jreback jreback modified the milestones: 0.15.0, 0.15.1 Jul 6, 2014
@jreback jreback modified the milestones: 0.15.0, 0.15.1 Sep 8, 2014
@jreback jreback removed this from the 0.16.0 milestone Mar 6, 2015
@simonjayhawkins
Copy link
Member

leads to:

Traceback (most recent call last):
  File "/home/alex/git/pandas/pandas/tseries/tests/test_52.py", line 29, in testName
    a = pd.Period("2013-12", freq=BusinessMonthEnd())
  File "/home/alex/git/pandas/pandas/tseries/period.py", line 121, in __init__
    base, mult = _gfc(freq)
  File "/home/alex/git/pandas/pandas/tseries/frequencies.py", line 92, in get_freq_code
    code = _period_str_to_code(freqstr[0])
  File "/home/alex/git/pandas/pandas/tseries/frequencies.py", line 757, in _period_str_to_code
    alias = _period_alias_dict[freqstr]
KeyError: 'BM'

this now raises a less cryptic ValueError: Invalid frequency: <BusinessMonthEnd>

It appears that the discussion here maybe now superseded by the discussion in #13871 so closing this one to help keep discussion in one place.

@simonjayhawkins simonjayhawkins removed this from the Someday milestone Jun 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Frequency DateOffsets Period Period data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants