-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] bucket_count is inaccurate when there are gaps in the data #30080
Comments
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this issue
Apr 30, 2018
This commit refactors the DataStreamDiagnostics class achieving the following advantages: - simpler code; by encapsulating the moving bucket histogram into its own class - better performance; by using an array to store the buckets instead of a map - explicit handling of gap buckets; in preparation of fixing elastic#30080
dimitris-athanasiou
added a commit
that referenced
this issue
May 1, 2018
This commit refactors the DataStreamDiagnostics class achieving the following advantages: - simpler code; by encapsulating the moving bucket histogram into its own class - better performance; by using an array to store the buckets instead of a map - explicit handling of gap buckets; in preparation of fixing #30080
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this issue
May 3, 2018
This commit fixes an issue with the data diagnostics were empty buckets are not reported even though they should. Once a job is reopened, the diagnostics do not get initialized from the current data counts (especially the latest record timestamp). The result is that if the data that is sent have a time gap compared to the previous ones, that gap is not accounted for in the empty bucket count. This commit fixes that by initializing the diagnostics with the current data counts. Closes elastic#30080
dimitris-athanasiou
added a commit
that referenced
this issue
May 3, 2018
This commit fixes an issue with the data diagnostics were empty buckets are not reported even though they should. Once a job is reopened, the diagnostics do not get initialized from the current data counts (especially the latest record timestamp). The result is that if the data that is sent have a time gap compared to the previous ones, that gap is not accounted for in the empty bucket count. This commit fixes that by initializing the diagnostics with the current data counts. Closes #30080
dimitris-athanasiou
added a commit
that referenced
this issue
May 3, 2018
This commit refactors the DataStreamDiagnostics class achieving the following advantages: - simpler code; by encapsulating the moving bucket histogram into its own class - better performance; by using an array to store the buckets instead of a map - explicit handling of gap buckets; in preparation of fixing #30080
dimitris-athanasiou
added a commit
that referenced
this issue
May 3, 2018
This commit fixes an issue with the data diagnostics were empty buckets are not reported even though they should. Once a job is reopened, the diagnostics do not get initialized from the current data counts (especially the latest record timestamp). The result is that if the data that is sent have a time gap compared to the previous ones, that gap is not accounted for in the empty bucket count. This commit fixes that by initializing the diagnostics with the current data counts. Closes #30080
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Original comment by @davidkyle:
Open a job send some data and close the job then reopen the job and send some data timestamped a week later than the previous batch. Autodetect will create empty bucket results for the intervening period but
DataCounts::bucket_count
will not reflect that.The test
MlBasicMultiNodeIT::testMiniFarequoteReopen
does exactly this but the test was asserting thatbucket_count == 2
rather thanbucket_count = 7 days of buckets
.bucket_count
should equal to the number of buckets written by autodetect, with the caveat that old results are sometimes pruned.The text was updated successfully, but these errors were encountered: