Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Cleanup backend for Offsets and Period #5148

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,11 @@ Improvements to existing features
in item handling (:issue:`6745`, :issue:`6988`).
- Improve performance in certain reindexing operations by optimizing ``take_2d`` (:issue:`6749`)
- Arrays of strings can be wrapped to a specified width (``str.wrap``) (:issue:`6999`)
- Constructor for ``Period`` now takes full set of possible ``Offset`` objects for ``freq``
parameter. (:issue:`4878`)
- Extends the number of ``Period``s supported by allowing for Python defined ``Period``s (:issue:`5148`)
- Added ``inferred_freq_offset`` as property on ``DatetimeIndex`` to provide the actual
Offset object rather than the string representation (:issue:`5082`).

.. _release.bug_fixes-0.14.0:

Expand Down Expand Up @@ -459,6 +464,7 @@ Bug Fixes
- Bug in timeseries-with-frequency plot cursor display (:issue:`5453`)
- Bug surfaced in groupby.plot when using a ``Float64Index`` (:issue:`7025`)
- Stopped tests from failing if options data isn't able to be downloaded from Yahoo (:issue:`7034`)
- Bug in not correctly treating 'QS', 'BQS', 'BQ', 'Y' as frquency aliases (:issue:`5028`).

pandas 0.13.1
-------------
Expand Down
5 changes: 4 additions & 1 deletion doc/source/v0.14.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ API changes
covs[df.index[-1]]

- ``Series.iteritems()`` is now lazy (returns an iterator rather than a list). This was the documented behavior prior to 0.14. (:issue:`6760`)

- ``pd.infer_freq`` and ``DatetimeIndex.inferred_freq`` now return a DateOffset subclass rather than a string. (:issue:`5082`)
- Added ``nunique`` and ``value_counts`` functions to ``Index`` for counting unique elements. (:issue:`6734`)
- ``stack`` and ``unstack`` now raise a ``ValueError`` when the ``level`` keyword refers
to a non-unique item in the ``Index`` (previously raised a ``KeyError``).
Expand Down Expand Up @@ -554,6 +554,9 @@ Enhancements
values='Quantity', aggfunc=np.sum)

- str.wrap implemented (:issue:`6999`)
- Constructor for ``Period`` now takes full set of possible ``Offset`` objects for ``freq``
parameter. (:issue:`4878`)
- Extends the number of ``Period``s supported by allowing for Python defined ``Period``s (:issue:`5148`)

.. _whatsnew_0140.performance:

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -2361,7 +2361,7 @@ def to_period(self, freq=None, copy=True):
new_values = new_values.copy()

if freq is None:
freq = self.index.freqstr or self.index.inferred_freq
freq = self.index.freq or self.index.inferred_freq
new_index = self.index.to_period(freq=freq)
return self._constructor(new_values,
index=new_index).__finalize__(self)
Expand Down
92 changes: 82 additions & 10 deletions pandas/tseries/frequencies.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def get_freq(freq):
return freq


def get_freq_code(freqstr):
def get_freq_code(freqstr, as_periodstr=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to default to do what it did before. [because I think this is actually a public method]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtratner Am I missing something here? The change should by default not change behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you updte docstring for parameters and returns

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cancan101 please make sure to update docstring here

"""

Parameters
Expand All @@ -81,7 +81,13 @@ def get_freq_code(freqstr):
-------
"""
if isinstance(freqstr, DateOffset):
freqstr = (get_offset_name(freqstr), freqstr.n)
freqstr_raw = get_offset_name(freqstr)

#if we can, convert to canonical period str
if as_periodstr:
freqstr_raw = get_period_alias(freqstr_raw)

freqstr = (freqstr_raw, freqstr.n)

if isinstance(freqstr, tuple):
if (com.is_integer(freqstr[0]) and
Expand Down Expand Up @@ -113,7 +119,7 @@ def _get_freq_str(base, mult=1):
code = _reverse_period_code_map.get(base)
if mult == 1:
return code
return str(mult) + code
return "%s%s" % (mult, code)


#----------------------------------------------------------------------
Expand Down Expand Up @@ -157,6 +163,7 @@ def _get_freq_str(base, mult=1):
'H': 'H',
'Q': 'Q',
'A': 'A',
'Y': 'A',
'W': 'W',
'M': 'M'
}
Expand Down Expand Up @@ -202,6 +209,9 @@ def get_period_alias(offset_str):
'Q@FEB': 'BQ-FEB',
'Q@MAR': 'BQ-MAR',
'Q': 'Q-DEC',
'QS': 'QS-JAN',
'BQ': 'BQ-DEC',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these new for Period? need to add to docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought there was an issue for these, maybe not. I'll make sure to mention these in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are to canonicalize the offsets s.t if offset1 == offset2 then offset1.name == offset2.name and offset is offset2 for Pandas defined offsets. This does not currently hold:

In [10]: get_offset("QS-JAN") == get_offset("QS")
Out[10]: True

In [29]: get_offset("QS-JAN").name == get_offset("QS").name
Out[29]: False

n [31]: get_offset("QS-JAN") is get_offset("QS")
Out[31]: False

Adding these aliases fixes that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cancan101 I'm +1 on equality, +0 on making names the same and -1 on making sure they are the same offset. It could happen to be that offset1 == offset2 implies offset1 is offset2, but that shouldn't be a property that is relied on or necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your votes here. I wasn't suggesting making the offsets
interned.
On Apr 14, 2014 9:37 PM, "Jeff Tratner" [email protected] wrote:

In pandas/tseries/frequencies.py:

@@ -202,6 +207,9 @@ def get_period_alias(offset_str):
'Q@FEB': 'BQ-FEB',
'Q@MAR': 'BQ-MAR',
'Q': 'Q-DEC',

  • 'QS': 'QS-JAN',
  • 'BQ': 'BQ-DEC',

@cancan101 https://github.com/cancan101 I'm +1 on equality, +0 on
making names the same and -1 on making sure they are the same offset. It
could happen to be that offset1 == offset2 implies offset1 is offset2,
but that shouldn't be a property that is relied on or necessary.


Reply to this email directly or view it on GitHubhttps://github.com//pull/5148/files#r11617032
.

'BQS': 'BQS-JAN',

'A': 'A-DEC', # YearEnd(month=12),
'AS': 'AS-JAN', # YearBegin(month=1),
Expand Down Expand Up @@ -387,19 +397,44 @@ def get_legacy_offset_name(offset):
name = offset.name
return _legacy_reverse_map.get(name, name)

def get_standard_freq(freq):
def get_standard_freq(freq, as_periodstr=False):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put an example inte docstring

Return the standardized frequency string
Return the standardized frequency string.
as_periodstr=True returns the string representing the period rather than
the frequency. An example when these may differ is MonthBegin.
MonthBegin and MonthEnd are two different frequencies but they define the
same period.

>>> get_standard_freq(pandas.tseries.offsets.MonthBegin(), as_periodstr=False)
'L'
>>> get_standard_freq(pandas.tseries.offsets.MonthEnd(), as_periodstr=False)
'M'
>>> get_standard_freq(pandas.tseries.offsets.MonthBegin(), as_periodstr=True)
'M'
>>> get_standard_freq(pandas.tseries.offsets.MonthEnd(), as_periodstr=True)
'M'
"""
if freq is None:
return None

if isinstance(freq, DateOffset):
return get_offset_name(freq)
code, stride = get_freq_code(freq, as_periodstr=as_periodstr)

code, stride = get_freq_code(freq)
return _get_freq_str(code, stride)

def _get_standard_period_freq_impl(freq):
return get_standard_freq(freq, as_periodstr=True)

def get_standard_period_freq(freq):
if isinstance(freq, DateOffset):
return freq.periodstr

return _get_standard_period_freq_impl(freq)

def _assert_mult_1(mult):
if mult != 1:
# TODO: Better error message - this is slightly confusing
raise ValueError('Only mult == 1 supported')

#----------------------------------------------------------------------
# Period codes

Expand Down Expand Up @@ -629,7 +664,7 @@ def infer_freq(index, warn=True):

Returns
-------
freq : string or None
freq : DateOffset object or None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so does this no longer accept a string at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "accept"? I changed the return type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right, read too quickly

None if no discernible frequency
TypeError if the index is not datetime-like
"""
Expand All @@ -650,7 +685,28 @@ def infer_freq(index, warn=True):

index = pd.DatetimeIndex(index)
inferer = _FrequencyInferer(index, warn=warn)
return inferer.get_freq()
return to_offset(inferer.get_freq())


def infer_freqstr(index, warn=True):
"""
Infer the most likely frequency given the input index. If the frequency is
uncertain, a warning will be printed

Parameters
----------
index : DatetimeIndex
if passed a Series will use the values of the series (NOT THE INDEX)
warn : boolean, default True

Returns
-------
freq : string or None
None if no discernible frequency
TypeError if the index is not datetime-like
"""
return infer_freq(index, warn).freqstr


_ONE_MICRO = long(1000)
_ONE_MILLI = _ONE_MICRO * 1000
Expand Down Expand Up @@ -887,9 +943,11 @@ def is_subperiod(source, target):
-------
is_subperiod : boolean
"""
source_raw = source
if isinstance(source, offsets.DateOffset):
source = source.rule_code

target_raw = target
if isinstance(target, offsets.DateOffset):
target = target.rule_code

Expand Down Expand Up @@ -918,6 +976,12 @@ def is_subperiod(source, target):
return source in ['T', 'S']
elif target == 'S':
return source in ['S']
elif isinstance(source_raw, offsets._NonCythonPeriod):
return source_raw.is_subperiod(target_raw)
elif isinstance(target_raw, offsets._NonCythonPeriod):
return target_raw.is_superperiod(source_raw)
else:
return False


def is_superperiod(source, target):
Expand All @@ -936,9 +1000,11 @@ def is_superperiod(source, target):
-------
is_superperiod : boolean
"""
source_raw = source
if isinstance(source, offsets.DateOffset):
source = source.rule_code

target_raw = target
if isinstance(target, offsets.DateOffset):
target = target.rule_code

Expand Down Expand Up @@ -971,6 +1037,12 @@ def is_superperiod(source, target):
return target in ['T', 'S']
elif source == 'S':
return target in ['S']
elif isinstance(source_raw, offsets._NonCythonPeriod):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do instance checks as opposed to a try/except that catches a NotImplementedError/AttributeError? Seems like that would be more Pythonic

Also, can we add an

else:
    return False

Just would be a little clearer here.

return source_raw.is_superperiod(target_raw)
elif isinstance(target_raw, offsets._NonCythonPeriod):
return target_raw.is_subperiod(source_raw)
else:
return False


def _get_rule_month(source, default='DEC'):
Expand Down
13 changes: 10 additions & 3 deletions pandas/tseries/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
from pandas.compat import u
from pandas.tseries.frequencies import (
infer_freq, to_offset, get_period_alias,
Resolution, get_reso_string, get_offset)
Resolution, get_reso_string, get_offset, infer_freqstr)
from pandas.tseries.offsets import DateOffset, generate_range, Tick, CDay
from pandas.tseries.tools import parse_time_string, normalize_date
from pandas.util.decorators import cache_readonly
Expand Down Expand Up @@ -792,8 +792,8 @@ def to_period(self, freq=None):
msg = "You must pass a freq argument as current index has none."
raise ValueError(msg)

if freq is None:
freq = get_period_alias(self.freqstr)
if freq is None: # No reason no convert to str; keep w/e freq is
freq = self.freq

return PeriodIndex(self.values, freq=freq, tz=self.tz)

Expand Down Expand Up @@ -1427,6 +1427,13 @@ def inferred_freq(self):
except ValueError:
return None

@cache_readonly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure this can be cached like this? Is it possible that it will become inferred later on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inferred_freq property is also marked as cache_readonly so I don't think so.

def inferred_freqstr(self):
try:
return infer_freqstr(self)
except ValueError:
return None

@property
def freqstr(self):
""" return the frequency object as a string if its set, otherwise None """
Expand Down
Loading