-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN/ENH: Stop instantiating all offsets on load. #5189
CLN/ENH: Stop instantiating all offsets on load. #5189
Conversation
_offset_map = {} | ||
# TODO: Incorporate this into the make_offset thing in offsets | ||
# And what, Nano doesn't exist with np >= 1.7?? | ||
# if not _np_version_under1p7: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nano is not supported under < 1.7 so it exists, but he rule-code should error should raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, lol it's not under.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyways, just need to fix up offsets.py for this.
for day in days] | ||
#singletons | ||
names += ['S', 'T', 'U', 'BM', 'BMS', 'BQ', 'QS'] # No 'Q' | ||
_offset_map.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put this in a setUp method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean for _offset_map
, I thought about it, but only matters for certain methods to actually clear the offset map - we can revisit as necessary. The rest we only need to test once - if more testing needed, can refactor then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I meant the call: _offset_map.clear()
. Fair enough
I'll make it |
Looking through, the rule code is pretty clear, it's |
Looking at the regex you can see there are a few more rules: for example numbers are allowed only right after the dash. |
Yeah, but that's really just for validation: it covers the set of rules that already exist (as well as the legacy rules from scikits.timeseries, etc). (since you only end up with #s in the string for WOM) If we wanted to add some crazy additional rule_code, you'd change the regex to make it pass the validation. |
I agree that we can always change the regex since the regex is just used for validation and not for parsing. We just want to make sure that we can express the parameterization for a given offset in a reasonable manner. |
Places where rule_code actually matters:
|
Okay, I have to go, but I think it's simple: you split on '-', first element needs to define class for lookup, you can have whatever else you want can go afterwards (and it's up to the dateoffset to handle it) |
Agreed. |
I would like to cleanup the is_subperiod/is_superperiod eventually. One of the nice features of the 52-53 week fiscal years is that by design they upsample to weeks. |
actually not sure whether we should deprecate rule_code, just want to work around it. |
@cancan101 good suggestion about adding a prefix attribute, ended up meaning that I could remove most of the special rule_codes and simplify things! (I ended up leaving comments with the actual prefixes just for clarity 😄) |
@jtratner An additional location where rule_code matters is |
okay well this doesn't change rule_code. |
That may be the case but I would be in favor of at the very least moving these over to using constants (along lines of #5189 (comment)) if not writing this code to check against a DateOffset object rather than the freq_str / rule_code. |
I want to push that to a separate refactor. This keeps the behavior relatively the same and will enable you to merge your 52-53 week PR. |
That's totally fine. I was just going through looking at the upsample/downsample code and was reminded of those string literals... |
frankly, you probably have a better sense of this than I, just figured I could simplify a bit for you first, make sure we have a working framework, then it's easier to make more advanced tweaks later. |
return offset | ||
else: | ||
raise ValueError('Bad rule name requested: %s.' % name) | ||
except KeyError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what conditions would a KeyError be raised (ie which lines could raise the error)?
@@ -683,6 +711,9 @@ def rule_code(self): | |||
6: 'SUN' | |||
} | |||
|
|||
# reversed | |||
_weekday_dict.update([(v, k) for k, v in _weekday_dict.items()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of adding the reversed map to the same dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Means that _weekday_dict['SUN']
is 6, while _weekday_dict[6]
is 'SUN'
and that way you don't have to maintain two separate dicts of the same information (have to lookup the first way in from_name
and the second in rule_code
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does Week.from_name(suffix=1)
give you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll raise an error because you can't slice it. I'll go change all of them to private methods now, thanks for the heads up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will actually create the Week but when that Week is used it will raise an Exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this makes me realize that my implementation is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it - need to catch the TypeError from getting an offset with too many -
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't see the benefit of sharing the same dict for forward and reverse lookups. It seems to leave open the possibility of not fully checking for invalid inputs. Admittedly the dict is private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, that's reasonable and trivial to implement - I'll add it.
Okay, I think this is in a pretty good place at this point. Thanks for the feedback @cancan101! Any other comments? @wesm - does this match what you were thinking? |
@jtratner NP. |
else: | ||
# what the heck is this Exception supposed to be for??? | ||
try: | ||
return offset.freqstr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any idea about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't know...all I know is that BusinessDay was treated separately and explicitly tested to be this way. I think I might be able to remove the try/except though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, can remove the try/except now because of the check - is this considered public? I.e., does it need to validate anything? If it's not I might put the try/except back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean rather, if it is, would prefer to have error be "Bad rule" rather than "No attribute freqstr"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put it back for just that reason, catching any Exception
@jreback and others - okay? |
ok by me |
Use a more flexible lookup based on prefix then delegate creation to class' `from_name()` method. Totally get rid of hasOffsetName (not necessary). Finally, make some inheriting go on to simplify things.
okay, going to merge this soon? any other comments? |
Is there a vbench for pandas load time? |
No, I don't think so (hard to time), but this is mostly to make your PR |
…load CLN/ENH: Stop instantiating all offsets on load.
@jtratner I'm totally fine with commit. I was just curious as to start up gain. |
It's a good point, I've definitely thought about it too. Not sure how you'd |
Maybe something from:
|
Go for it - I'd certainly be interested to see the result. |
@jtratner Any reason for this code to still exist in frequencies:
|
shoot, the _offset_map part shouldn't be there. _rule_aliases needs to be though (I think). I didn't cover everything, there's still a bunch of work for you to do after this. |
I mean in the internal cleanup you previously proposed. |
I dont think you need the alias part either. You handle that:
|
I will leave the alias in for now but I will remove the |
yes On Mon, Oct 14, 2013 at 11:42 PM, Alex Rothberg [email protected]:
|
klass = prefix_mapping[split[0]] | ||
# handles case where there's no suffix (and will TypeError if too many '-') | ||
obj = klass._from_name(*split[1:]) | ||
obj._named = key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for setting the _named
field here? This leads to some printing weirdness: #5148 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to preserve the special case existing behavior for shorter offset names (i.e. there are offsets that are synonyms and we wanted to preserve their repr output in the console).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is depending how the same offset is created , it will print
differently.
On Apr 14, 2014 5:49 PM, "Jeff Tratner" [email protected] wrote:
In pandas/tseries/offsets.py:
+if not _np_version_under1p7:
Only 1.7+ supports nanosecond resolution
- prefix_mapping['N'] = Nano
+def _make_offset(key):
- """Gets offset based on key. KeyError if prefix is bad, ValueError if
- suffix is bad. All handled by
get_offset
in tseries/frequencies. Not- public."""
- if key is None:
return None
- split = key.replace('@', '-').split('-')
- klass = prefix_mapping[split[0]]
handles case where there's no suffix (and will TypeError if too many '-')
- obj = klass._from_name(*split[1:])
- obj._named = key
to preserve the special case existing behavior for shorter offset names
(i.e. there are offsets that are synonyms and we wanted to preserve their
repr output in the console).—
Reply to this email directly or view it on GitHubhttps://github.com//pull/5189/files#r11610049
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. 6 months later, I agree with you that we should just have canonicalized offset names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking back, that was probably a mistake on my part, I'm okay with changing it
Implements @wesm's suggestion and clears the way for new offsets.
Very simple to extend - map a string to your class in mapping, define a
from_name
classmethod, then class gets passed everything after the
-
(ifanything) to its method
from_name()
. Caches to dict afterwards (rightnow doesn't use cacheable, but could take advantage of that).
cc @cancan101
For reference, here are all the existing offset: output aliases (using
types because otherwise prints exactly the same).
[(None, NoneType),
('A-APR', pandas.tseries.offsets.YearEnd),
('A-AUG', pandas.tseries.offsets.YearEnd),
('A-DEC', pandas.tseries.offsets.YearEnd),
('A-FEB', pandas.tseries.offsets.YearEnd),
('A-JAN', pandas.tseries.offsets.YearEnd),
('A-JUL', pandas.tseries.offsets.YearEnd),
('A-JUN', pandas.tseries.offsets.YearEnd),
('A-MAR', pandas.tseries.offsets.YearEnd),
('A-MAY', pandas.tseries.offsets.YearEnd),
('A-NOV', pandas.tseries.offsets.YearEnd),
('A-OCT', pandas.tseries.offsets.YearEnd),
('A-SEP', pandas.tseries.offsets.YearEnd),
('AS-APR', pandas.tseries.offsets.YearBegin),
('AS-AUG', pandas.tseries.offsets.YearBegin),
('AS-DEC', pandas.tseries.offsets.YearBegin),
('AS-FEB', pandas.tseries.offsets.YearBegin),
('AS-JAN', pandas.tseries.offsets.YearBegin),
('AS-JUL', pandas.tseries.offsets.YearBegin),
('AS-JUN', pandas.tseries.offsets.YearBegin),
('AS-MAR', pandas.tseries.offsets.YearBegin),
('AS-MAY', pandas.tseries.offsets.YearBegin),
('AS-NOV', pandas.tseries.offsets.YearBegin),
('AS-OCT', pandas.tseries.offsets.YearBegin),
('AS-SEP', pandas.tseries.offsets.YearBegin),
('B', pandas.tseries.offsets.BusinessDay),
('BA-APR', pandas.tseries.offsets.BYearEnd),
('BA-AUG', pandas.tseries.offsets.BYearEnd),
('BA-DEC', pandas.tseries.offsets.BYearEnd),
('BA-FEB', pandas.tseries.offsets.BYearEnd),
('BA-JAN', pandas.tseries.offsets.BYearEnd),
('BA-JUL', pandas.tseries.offsets.BYearEnd),
('BA-JUN', pandas.tseries.offsets.BYearEnd),
('BA-MAR', pandas.tseries.offsets.BYearEnd),
('BA-MAY', pandas.tseries.offsets.BYearEnd),
('BA-NOV', pandas.tseries.offsets.BYearEnd),
('BA-OCT', pandas.tseries.offsets.BYearEnd),
('BA-SEP', pandas.tseries.offsets.BYearEnd),
('BAS-APR', pandas.tseries.offsets.BYearBegin),
('BAS-AUG', pandas.tseries.offsets.BYearBegin),
('BAS-DEC', pandas.tseries.offsets.BYearBegin),
('BAS-FEB', pandas.tseries.offsets.BYearBegin),
('BAS-JAN', pandas.tseries.offsets.BYearBegin),
('BAS-JUL', pandas.tseries.offsets.BYearBegin),
('BAS-JUN', pandas.tseries.offsets.BYearBegin),
('BAS-MAR', pandas.tseries.offsets.BYearBegin),
('BAS-MAY', pandas.tseries.offsets.BYearBegin),
('BAS-NOV', pandas.tseries.offsets.BYearBegin),
('BAS-OCT', pandas.tseries.offsets.BYearBegin),
('BAS-SEP', pandas.tseries.offsets.BYearBegin),
('BM', pandas.tseries.offsets.BusinessMonthEnd),
('BMS', pandas.tseries.offsets.BusinessMonthBegin),
('BQ', pandas.tseries.offsets.BQuarterEnd),
('BQ-APR', pandas.tseries.offsets.BQuarterEnd),
('BQ-AUG', pandas.tseries.offsets.BQuarterEnd),
('BQ-DEC', pandas.tseries.offsets.BQuarterEnd),
('BQ-FEB', pandas.tseries.offsets.BQuarterEnd),
('BQ-JAN', pandas.tseries.offsets.BQuarterEnd),
('BQ-JUL', pandas.tseries.offsets.BQuarterEnd),
('BQ-JUN', pandas.tseries.offsets.BQuarterEnd),
('BQ-MAR', pandas.tseries.offsets.BQuarterEnd),
('BQ-MAY', pandas.tseries.offsets.BQuarterEnd),
('BQ-NOV', pandas.tseries.offsets.BQuarterEnd),
('BQ-OCT', pandas.tseries.offsets.BQuarterEnd),
('BQ-SEP', pandas.tseries.offsets.BQuarterEnd),
('BQS', pandas.tseries.offsets.BQuarterBegin),
('BQS-APR', pandas.tseries.offsets.BQuarterBegin),
('BQS-AUG', pandas.tseries.offsets.BQuarterBegin),
('BQS-DEC', pandas.tseries.offsets.BQuarterBegin),
('BQS-FEB', pandas.tseries.offsets.BQuarterBegin),
('BQS-JAN', pandas.tseries.offsets.BQuarterBegin),
('BQS-JUL', pandas.tseries.offsets.BQuarterBegin),
('BQS-JUN', pandas.tseries.offsets.BQuarterBegin),
('BQS-MAR', pandas.tseries.offsets.BQuarterBegin),
('BQS-MAY', pandas.tseries.offsets.BQuarterBegin),
('BQS-NOV', pandas.tseries.offsets.BQuarterBegin),
('BQS-OCT', pandas.tseries.offsets.BQuarterBegin),
('BQS-SEP', pandas.tseries.offsets.BQuarterBegin),
('Q-APR', pandas.tseries.offsets.QuarterEnd),
('Q-AUG', pandas.tseries.offsets.QuarterEnd),
('Q-DEC', pandas.tseries.offsets.QuarterEnd),
('Q-FEB', pandas.tseries.offsets.QuarterEnd),
('Q-JAN', pandas.tseries.offsets.QuarterEnd),
('Q-JUL', pandas.tseries.offsets.QuarterEnd),
('Q-JUN', pandas.tseries.offsets.QuarterEnd),
('Q-MAR', pandas.tseries.offsets.QuarterEnd),
('Q-MAY', pandas.tseries.offsets.QuarterEnd),
('Q-NOV', pandas.tseries.offsets.QuarterEnd),
('Q-OCT', pandas.tseries.offsets.QuarterEnd),
('Q-SEP', pandas.tseries.offsets.QuarterEnd),
('QS', pandas.tseries.offsets.QuarterBegin),
('QS-APR', pandas.tseries.offsets.QuarterBegin),
('QS-AUG', pandas.tseries.offsets.QuarterBegin),
('QS-DEC', pandas.tseries.offsets.QuarterBegin),
('QS-FEB', pandas.tseries.offsets.QuarterBegin),
('QS-JAN', pandas.tseries.offsets.QuarterBegin),
('QS-JUL', pandas.tseries.offsets.QuarterBegin),
('QS-JUN', pandas.tseries.offsets.QuarterBegin),
('QS-MAR', pandas.tseries.offsets.QuarterBegin),
('QS-MAY', pandas.tseries.offsets.QuarterBegin),
('QS-NOV', pandas.tseries.offsets.QuarterBegin),
('QS-OCT', pandas.tseries.offsets.QuarterBegin),
('QS-SEP', pandas.tseries.offsets.QuarterBegin),
('S', pandas.tseries.offsets.Second),
('T', pandas.tseries.offsets.Minute),
('U', pandas.tseries.offsets.Micro),
('W-FRI', pandas.tseries.offsets.Week),
('W-MON', pandas.tseries.offsets.Week),
('W-SAT', pandas.tseries.offsets.Week),
('W-SUN', pandas.tseries.offsets.Week),
('W-THU', pandas.tseries.offsets.Week),
('W-TUE', pandas.tseries.offsets.Week),
('W-WED', pandas.tseries.offsets.Week),
('WOM-1FRI', pandas.tseries.offsets.WeekOfMonth),
('WOM-1MON', pandas.tseries.offsets.WeekOfMonth),
('WOM-1THU', pandas.tseries.offsets.WeekOfMonth),
('WOM-1TUE', pandas.tseries.offsets.WeekOfMonth),
('WOM-1WED', pandas.tseries.offsets.WeekOfMonth),
('WOM-2FRI', pandas.tseries.offsets.WeekOfMonth),
('WOM-2MON', pandas.tseries.offsets.WeekOfMonth),
('WOM-2THU', pandas.tseries.offsets.WeekOfMonth),
('WOM-2TUE', pandas.tseries.offsets.WeekOfMonth),
('WOM-2WED', pandas.tseries.offsets.WeekOfMonth),
('WOM-3FRI', pandas.tseries.offsets.WeekOfMonth),
('WOM-3MON', pandas.tseries.offsets.WeekOfMonth),
('WOM-3THU', pandas.tseries.offsets.WeekOfMonth),
('WOM-3TUE', pandas.tseries.offsets.WeekOfMonth),
('WOM-3WED', pandas.tseries.offsets.WeekOfMonth),
('WOM-4FRI', pandas.tseries.offsets.WeekOfMonth),
('WOM-4MON', pandas.tseries.offsets.WeekOfMonth),
('WOM-4THU', pandas.tseries.offsets.WeekOfMonth),
('WOM-4TUE', pandas.tseries.offsets.WeekOfMonth),
('WOM-4WED', pandas.tseries.offsets.WeekOfMonth)]