-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/API: implement DayDST #44364
BUG/API: implement DayDST #44364
Conversation
The deprecation path would be nice such that we don't need to implement Here was my |
So I guess the deprecation would be to take all the places where this PR uses DayDST and issue a warning telling users to use "24h" if they don't want CalendarDay behavior. Then when we enforce the deprecation, we change Day to behave like DayDST does in this branch (so DayDST never lands in master) |
Do you mean that the code that is in What do you mean exactly with "decide what to do with freq="infer" "?
I think ideally we only warn for a specific behaviour that will actually change in the future (to the extent this is achievable) |
Sort of. It is doing what it was originally intended to do (https://github.com/pandas-dev/pandas/pull/44364/files#diff-d4ebb574a7d2ddbedc3b4eed31907f98a8cf66721cd8080810890997ca2f8fe3L435), and in that sense is not doing anything wrong. BUT that originally intended behavior is the cause of bugs. e.g. with non-None freq, we should always have
Consider e.g.
The resample case I'm not sure of, but some |
@mroeschke i stumbled on #24330; did we already do this deprecation and then not enforce it quite right? |
Unfortunately no, I think #22867 was my attempt to enforce the deprecation but I lost steam handling all the warnings. |
But we want "D" and "DayDST" to mean the same (eventually)? (in which case there no decision to make) In general, from your answers in #44364 (comment), I get the feeling that it's still not clearly defined what we actually consider the wrong behaviour / how we want to solve this. |
My understanding is that nothing in
If our goal is that |
Good idea.
Changing
|
As a recap, #41943 (comment) describes:
IMO the crux is to make |
According to the summary in the issue (#41943), resample already treats "D" as calendar day. And based on a quick check, it seems it does not treat "24H" the same, but actually treats is as a proper 24H fixed frequency:
So based on what we discussed before, nothing has to change here.
Since timedeltas are by definition timezone naive, and since "D" is always equivalent to "24H" for timezone naive timestamps, I personally don't have a problem interpreting "D" as such for timedeltas as well. But I think this was still a discussion point in earlier issues, see #22274 (comment) and last two paragraphs of #22274 (comment) I think the question here is whether we want to 1) deprecate "D" in timedelta context, or 2) special case it as fixed 24H frequency.
But if there is no DST crossing, "D" or "24H" give the same output from
Yes, but to be clear: that's not necessarily an issue in |
Ah, I suppose you mean "keep the 24H-like behaviour" in subsequent offset arithmetic ? Let's use an example without date_range, to just focus on the arithmetic part:
So here "D" acts as "24H", and so if we make "D" always act as calendar day, the above example will show a change in behaviour. So also if you are using |
Yes, thank you for making that explicit. It is not only arithmetic with DateOffset objects though. e.g. add/sub with a Timedelta scalar will preserve a Tick freq, which will change if Day ceases to subset Tick. This means there is a potential behavior change when you construct a DTI without a DST crossing, then add a timedelta scalar:
|
I find it "expected" that this will not be equal, if "D" is a calendar day while a Timedelta is always a fixed-time delta. |
I agree, am saying that it constitutes an API change |
cc @mroeschke https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=69864&view=logs&j=054c7612-f473-5cf0-0a65-77b54574553a&t=7218990d-7c28-52cc-5797-2fee2b0645b4&l=121 Caused unexpected warning(s): [('ResourceWarning', ResourceWarning('unclosed file <_io.BufferedRandom name=16>'), '/Users/runner/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/jinja2/environment.py', 474)] |
:( Guess the search continues. Probably still dependency related at this point. |
Another option would be to change the behavior of |
this is probably the way to go |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
Mothballing to clear the queue since this is on the "do in 2.0" list |
xref #41943 cc @mroeschke
The tests all passed locally up until the last commit, have a handful of broken resample tests now.
The approach here is to keep Day as-is and implement DayDST, fix the actively-wrong infer_freq behavior (see test_infer_freq_across_dst_not_daily), then decide what to do with
freq="infer"
andfreq="D"
based on the presence of a tz.I guess a deprecation path would be to warn users passing "D" or Day() that it will have DST-aware semantics in the future, and to keep Timedelta-like behavior they should pass "24H". Would we want to warn in cases where they are equvialent, e.g. tz="UTC" or with a date_range that doesn't happen to cross a DST transition?
With such a deprecation cycle, we could avoid implementing DayDST and when the time comes just call it Day.