-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
frequencies: Number of pivot points differs for same date range when both start/end date are inclusive #963
Comments
Updates the specific values of expected frequencies for functional tests, since the corrected pivot calculation logic produces new pivots and correspondingly new frequency values per pivot. One nice side effect of the new pivot calculations is that they produce the correct number of pivots in our functional test of relative min and max date values that had previously failed in a flaky manner. The corrected pivot logic reveals that the "flaky" behavior of that test was actually the correct behavior: we should always get 3 pivots when the max date is 6 months ago, the min date is 12 months ago, and the pivot interval is 3 months. That we previously only got 2 pivots from these relative dates most times of the year reflects the invalid pivot calculation logic from a different perspective. This commit reactivates that previously commented-out test. Fixes #963
Updates the specific values of expected frequencies for functional tests, since the corrected pivot calculation logic produces new pivots and correspondingly new frequency values per pivot. One nice side effect of the new pivot calculations is that they produce the correct number of pivots in our functional test of relative min and max date values that had previously failed in a flaky manner. The corrected pivot logic reveals that the "flaky" behavior of that test was actually the correct behavior: we should always get 3 pivots when the max date is 6 months ago, the min date is 12 months ago, and the pivot interval is 3 months. That we previously only got 2 pivots from these relative dates most times of the year reflects the invalid pivot calculation logic from a different perspective. This commit reactivates that previously commented-out test. Fixes #963
@huddlej a similar problem is causing CI to fail today, this time being the last day of the month. A small snippet to repro: from augur.frequency_estimators import get_pivots
from augur.dates import numeric_date
from treetime.utils import datestring_from_numeric
start_date = numeric_date('2022-01-01')
end_date = numeric_date('2022-03-01')
pivots = get_pivots(observations=[], start_date=start_date, end_date=end_date, pivot_interval=1, pivot_interval_units="months")
[datestring_from_numeric(date) for date in pivots]
# ['2022-01-01', '2022-02-01', '2022-03-01']
start_date = numeric_date('2022-01-31')
end_date = numeric_date('2022-03-31')
pivots = get_pivots(observations=[], start_date=start_date, end_date=end_date, pivot_interval=1, pivot_interval_units="months")
[datestring_from_numeric(date) for date in pivots]
# ['2022-02-28', '2022-03-31'] |
Update: we discussed this on Slack and it turns out this is a side effect of the way the duration is applied on incrementally updated augur/augur/frequency_estimators.py Lines 79 to 82 in e12dbe1
The proper solution is to determine pivots based on the original |
Well something related cropped up again on today, March 31st. Converted the failing test into an example here: from augur.frequency_estimators import get_pivots
from augur.dates import numeric_date
from treetime.utils import datestring_from_numeric
start_date = numeric_date('2022-03-30')
end_date = numeric_date('2022-09-30')
pivots = get_pivots(observations=[], start_date=start_date, end_date=end_date, pivot_interval=3, pivot_interval_units="months")
[datestring_from_numeric(date) for date in pivots]
# ['2022-03-30', '2022-06-30', '2022-09-30']
start_date = numeric_date('2022-03-31')
end_date = numeric_date('2022-09-30')
pivots = get_pivots(observations=[], start_date=start_date, end_date=end_date, pivot_interval=3, pivot_interval_units="months")
[datestring_from_numeric(date) for date in pivots]
# ['2022-06-30', '2022-09-30'] The reason is that subtracting monthly units from I think this is reasonable behavior. Given the purpose of the failing test:
and that we can't expect logical consistency, I'd propose to remove this test since relative dates and pivot intervals are already covered by pytests (1, 2). |
Current Behavior
Setup:
Getting monthly pivots between
2022-01-02
and2022-03-02
gives 2 pivots:However, shifting back one day to
2022-01-01
and2022-03-01
gives 3 pivots:Expected behavior
The same date range should produce the same number of pivots (i.e.
2022-01-01
and2022-03-01
should still give 2 pivots).Explanation
This is due to 2 behaviors in current usage of
pd.date_range()
:The pivot points are fixed at the start of every month:
augur/augur/frequency_estimators.py
Lines 56 to 57 in f68c448
The default parameter of
closed=None
topd.date_range()
means the interval is inclusive of bothstart_date
andend_date
.It seems like this behavior is intentional, given the test here:
augur/tests/test_frequencies.py
Lines 85 to 93 in f68c448
Possible solution
In the call to
pd.date_range()
augur/augur/frequency_estimators.py
Lines 63 to 66 in f68c448
specify either
closed='left'
orclosed='right'
so the range is inclusive on only one side. However, I'm not sure which is more meaningful for the application here. It also breaks a few existing tests, see failing tests on victorlin@1487790.@huddlej what do you think of this approach?
Your environment: if running Nextstrain locally
auspice 2.7.0
): Augur 15.0.2The text was updated successfully, but these errors were encountered: