-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing time limits aggregate queries and interval buckets #2146
Fixing time limits aggregate queries and interval buckets #2146
Conversation
My tests failed the race step. I don't think I changed anything that would affect this? Perhaps someone has seen this before and could help?
|
@jnutzmann -- this is a known race in our code, which we're working on. Let me kick your build. |
@pauldix review, por favor? |
Needs a rebase too, let me know if you need assistance @jnutzmann |
…pping Don't panic if presented with a field of unknown type
Rebase is done. Looks like it failed in the race again, might need a build kick. I don't think I can do that? |
@jnutzmann kicking it now |
@jnutzmann thanks for the rebase and continued effort. we've got one more branch we need to get merged in before this gets accepted - i'll ping you once that's in. |
History & Purpose
This pull request relates to two previous PR: #2016 and #2067.
#2016 was a PR I submitted to fix a bucket alignment issue. Unfortunately, I introduced a different bug that broke raw queries with a time range specified.
#2067 was created by @corylanou to fix my bug, but created another issue reported in #2120 and #2027.
In fixing this, I found another few issues (described below). IMO, they are less severe, so I will leave it up to the owners whether they should be fixed now with this PR or if new issues should be pulled and they should be fixed after.
Also, there is a definite need for more test cases around this.
Testing
There seems to be 3 types of queries that need to be tested:
SELECT value FROM my_series WHERE time >= '2015-03-31T00:30:00Z' AND time < '2015-03-31T00:30:10Z'
(with or without both time bounds)SELECT count(value) FROM my_series WHERE time >= '2015-03-31T00:30:00Z' AND time < '2015-03-31T00:30:10Z'
SELECT count(value) FROM my_series WHERE time >= '2015-03-31T00:30:00Z' GROUP BY time(1h)
To verify these are now correct, I created a database with data points spaced every 2 seconds for a few days.
Here are my findings case-by-case:
Case 1
Query:
SELECT value FROM my_series WHERE time >= '2015-03-31T00:30:00Z' AND time <= '2015-03-31T00:30:10Z'
Results (same on master vs this PR):
New Issue 1: The results are missing the last point at
2015-03-31T00:30:10Z
. If we run the query withtime < '2015-03-31T00:30:10Z'
(getting rid of equal), we get the same result. If we add a second (time < '2015-03-31T00:30:11Z'
), then we get the point back. The top bound is probably off by a nanosecond.Case 2
Query1:
SELECT count(value) FROM my_series WHERE time >= '2015-03-31T00:30:00Z' AND time < '2015-03-31T00:30:10Z'
Results (same for master and PR):
Query 2:
SELECT count(value) FROM my_series WHERE time >= '2015-03-31T00:30:00Z'
This query is where the bug fix comes in. In master we get inconsistent values (0, 2760, 17392, 37953, etc). Timestamps varied as well. With this PR, we get:
As expected, the PR fixes this issue.
Case 3
Query:
SELECT count(value) FROM my_series WHERE time >= '2015-03-31T00:30:00Z' AND time < '2015-03-31T4:30:00Z' GROUP BY time(1h)
Results (same in master and PR):
New Issue 2: In the last time interval we get all the results back, when we should only be getting half, as we specified
2015-03-31T4:30:00Z
as the end time. We can vary this from2015-03-31T4:01:00Z
to2015-03-31T5:00:00Z
and get the same results.Please let me know your thoughts on this (and review my changes carefully :) )