Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data streams telemetry device #1296

Merged
merged 9 commits into from
Jul 13, 2021
Merged

Conversation

b-deam
Copy link
Member

@b-deam b-deam commented Jun 30, 2021

With this commit we add a data streams telemetry device that regularly
samples the count and store size of all data streams within a cluster.

Closes #1161

Tested locally with this test track (that uses my Weatherbeat data, invoked with these commands for both in-memory and external metrics stores:

# This should fail due to major version < 7
esrally race --distribution-version=6.3.0 --car="4gheap,x-pack-security" --track=rally/tracks/weatherbeat --track-params=rally/tracks/weatherbeat/params.json --client-options="timeout:180,use_ssl:true,verify_certs:false,basic_auth_user:'rally',basic_auth_password:'rally-password'" --telemetry data-stream-stats --telemetry-params="data-stream-stats-sample-interval: 1" --kill-running-processes 

# This should fail due to minor version < 7.9.0
esrally race --distribution-version=7.8.0 --car="4gheap,x-pack-security" --track=rally/tracks/weatherbeat --track-params=rally/tracks/weatherbeat/params.json --client-options="timeout:180,use_ssl:true,verify_certs:false,basic_auth_user:'rally',basic_auth_password:'rally-password'" --telemetry data-stream-stats --telemetry-params="data-stream-stats-sample-interval: 1" --kill-running-processes

# This should fail due to OSS distribution 
esrally race --distribution-version=7.9.0 --track=rally/tracks/weatherbeat --track-params=rally/tracks/weatherbeat/params.json --client-options="timeout:180" --telemetry data-stream-stats --telemetry-params="data-stream-stats-sample-interval: 1" --kill-running-processes

esrally race --distribution-version=7.9.0 --car="4gheap,x-pack-security" --track=rally/tracks/weatherbeat --track-params=rally/tracks/weatherbeat/params.json --client-options="timeout:180,use_ssl:true,verify_certs:false,basic_auth_user:'rally',basic_auth_password:'rally-password'" --telemetry data-stream-stats --telemetry-params="data-stream-stats-sample-interval: 1" --kill-running-processes

esrally race --distribution-version=7.10.0 --car="4gheap,x-pack-security" --track=rally/tracks/weatherbeat --track-params=rally/tracks/weatherbeat/params.json --client-options="timeout:180,use_ssl:true,verify_certs:false,basic_auth_user:'rally',basic_auth_password:'rally-password'" --telemetry data-stream-stats --telemetry-params="data-stream-stats-sample-interval: 1" --kill-running-processes

esrally race --distribution-version=7.11.0 --car="4gheap,x-pack-security" --track=rally/tracks/weatherbeat --track-params=rally/tracks/weatherbeat/params.json --client-options="timeout:180,use_ssl:true,verify_certs:false,basic_auth_user:'rally',basic_auth_password:'rally-password'" --telemetry data-stream-stats --telemetry-params="data-stream-stats-sample-interval: 1" --kill-running-processes

# Version 7.11+ no longer needs x-pack explicitly defined due to no OSS build distribution
esrally race --distribution-version=7.11.0 --track=rally/tracks/weatherbeat --track-params=rally/tracks/weatherbeat/params.json --client-options="timeout:180" --telemetry data-stream-stats --telemetry-params="data-stream-stats-sample-interval: 1" --kill-running-processes

esrally race --distribution-version=7.12.0 --track=rally/tracks/weatherbeat --track-params=rally/tracks/weatherbeat/params.json --client-options="timeout:180" --telemetry data-stream-stats --telemetry-params="data-stream-stats-sample-interval: 1" --kill-running-processes

# Tested with params file
esrally race --distribution-version=7.11.0 --car="4gheap,x-pack-security" --track=/Users/bradleydeam/perf/onboarding/rally/tracks/weatherbeat --track-params=/Users/bradleydeam/perf/onboarding/rally/tracks/weatherbeat/params.json --client-options="timeout:180,use_ssl:true,verify_certs:false,basic_auth_user:'rally',basic_auth_password:'rally-password'" --telemetry data-stream-stats --telemetry-params=/Users/bradleydeam/perf/onboarding/rally/tracks/weatherbeat/telemetry-params.json --kill-running-processes

Also tested with telemetry-params.json:

{
    "data-stream-stats-sample-interval": 5
}

@b-deam b-deam self-assigned this Jun 30, 2021
@b-deam b-deam added :Telemetry Telemetry Devices that gather additional metrics enhancement Improves the status quo labels Jun 30, 2021
@b-deam b-deam added this to the 2.3.0 milestone Jun 30, 2021
@b-deam b-deam force-pushed the datastreams-stats branch from 555afc0 to 0530607 Compare June 30, 2021 03:36
@pquentin
Copy link
Member

Sorry for the conflict with the black/isort pull request. I fixed the conflicts in my fork, see pquentin@518f2e7 (I squashed all your commits into a single one).

b-deam added 6 commits July 1, 2021 10:19
With this commit we add a data streams telemetry device that regularly
samples the count and store size of all data streams within a cluster.

Closes elastic#1161
@b-deam b-deam force-pushed the datastreams-stats branch from 0530607 to 2cbc6dd Compare July 1, 2021 01:03
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. I left some comments but I think it make sense that @gingerwizard has a look at this from a functional perspective.

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! The changes look good to me but let's wait for feedback from @gingerwizard for a more user-focussed perspective.

@gingerwizard
Copy link

Functionally this is fine for a first pass but due to limitations is probably not going to be used for replacing existing custom data stream collection yet.

Areas for thought:

  1. The store_size_bytes includes replicas I assume? If so, id like to understand the primary store cost if possible. I appreciate that the mapping could potentially change across indices, for the lifetime of the datastream, which complicates this. This leads to (2) therefore. As a first pass maybe a replica and primary count, however? The ratio can in turn be used to calculate the primary size.
  2. You note the number of indices but not the size of each. I wonder if a doc per index of the data stream would make more sense - this would make visualizing a little more challenging unless you add a common key that denoted collection per unit time per datastream/index for aggregating.
  3. The only other statistic which would be useful would be the min and max of the date within the data stream (and each index). A low priority, however.

Copy link

@gingerwizard gingerwizard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment. LGTM as first pass.

@danielmitterdorfer
Copy link
Member

Functionally this is fine for a first pass but due to limitations is probably not going to be used for replacing existing custom data stream collection yet.

Our goal should be to reduce custom functionality as much as possible. If this PR is not there yet, we should iterate with the goal of being able to replace custom solutions. I'm not in favor of merging something that does not address this.

The store_size_bytes includes replicas I assume? If so, id like to understand the primary store cost if possible.

I don't know whether it's possible to derive it, but assuming it's possible, isn't it sufficient to have one property for the total store size (incl. replicas) and one without? I also don't understand why we would want a per index view on a data stream? As this is a sampling telemetry device, this also leads to a significant increase in the number of metrics documents which we should be very cautious about.

I appreciate that the mapping could potentially change across indices, for the lifetime of the datastream

Can you elaborate the reasons why the mapping would change in a benchmark? Can we make the simplifying assumption that it does not change?

@danielmitterdorfer
Copy link
Member

We had an offline conversation about this and it makes sense to merge this as is as a first step. Based on our experience we can refine this iteratively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Telemetry Telemetry Devices that gather additional metrics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create datastreams-stats telemetry device
4 participants