-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add refresh/merge/flush totals in summary #615
Conversation
Due to elastic#608 it's likely we need to benchmark scenarios without using the node-stats telemetry device. At the same time we want to get a general idea of how many refreshes/merges/flushes happened (in total) by accessing the index stats. Add total count for merges/refresh/flush in summary output; this is collected from `_all/primaries` in `_stats`. Also modify summary description to clarify the values are totals from primary shards. Finally fix bug where index stats where time/count == 0 got skipped from the summary. Closes elastic#614
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I left a couple of comments and suggestions.
esrally/reporter.py
Outdated
self.line("Total Young Gen GC", "", stats.young_gc_time, "s", convert.ms_to_seconds), | ||
self.line("Total Old Gen GC", "", stats.old_gc_time, "s", convert.ms_to_seconds) | ||
self.line("Total primaries Young Gen GC", "", stats.young_gc_time, "s", convert.ms_to_seconds), | ||
self.line("Total primaries Old Gen GC", "", stats.old_gc_time, "s", convert.ms_to_seconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct as GC times are gathered on node-level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch thanks, will fix.
esrally/reporter.py
Outdated
return lines | ||
|
||
def report_total_time(self, name, baseline_total, baseline_per_shard, contender_total, contender_per_shard): | ||
unit = "min" | ||
return self.join( | ||
self.line("Total {}".format(name), baseline_total, contender_total, "", unit, | ||
self.line("Total primaries {}".format(name), baseline_total, contender_total, "", unit, | ||
treat_increase_as_improvement=False, formatter=convert.ms_to_minutes), | ||
self.line("Min {} per shard".format(name), baseline_per_shard.get("min"), contender_per_shard.get("min"), "", unit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also say here: "... per primary shard"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there can be >1 primary shards, how about "... across primary shards"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or "... of primary shards"?
esrally/reporter.py
Outdated
treat_increase_as_improvement=False, formatter=convert.ms_to_seconds), | ||
self.line("Total Old Gen GC", baseline_stats.old_gc_time, contender_stats.old_gc_time, "", "s", | ||
self.line("Total primaries Old Gen GC", baseline_stats.old_gc_time, contender_stats.old_gc_time, "", "s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I don't think that the GC times should mention primaries.
esrally/mechanic/telemetry.py
Outdated
doc = { | ||
"name": name, | ||
"value": value, | ||
"unit": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should specify unit at all for a count?
counts = [] | ||
self.index_count(counts, stats, "merges_total_count", ["merges", "total"]) | ||
self.index_count(counts, stats, "refresh_total_count", ["refresh", "total"]) | ||
self.index_count(counts, stats, "flush_total_count", ["flush", "total"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these new metric keys should be documented in https://esrally.readthedocs.io/en/stable/metrics.html#metric-keys. Also, the corresponding docs for the summary report should be amended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address.
esrally/reporter.py
Outdated
self.line("Min {} per shard".format(name), "", total_time_per_shard.get("min"), unit, convert.ms_to_minutes), | ||
self.line("Median {} per shard".format(name), "", total_time_per_shard.get("median"), unit, convert.ms_to_minutes), | ||
self.line("Max {} per shard".format(name), "", total_time_per_shard.get("max"), unit, convert.ms_to_minutes), | ||
) | ||
|
||
def report_total_counts(self, stats): | ||
lines = [] | ||
lines.extend(self.report_total_count("merge count", stats.merge_count)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nicer if the output is grouped by action (merge, refresh, flush) instead of all times together and all counts together.
esrally/reporter.py
Outdated
@@ -771,12 +790,18 @@ def report_total_times(self, baseline_stats, contender_stats): | |||
lines.extend(self.report_total_time("flush time", | |||
baseline_stats.flush_time, baseline_stats.flush_time_per_shard, | |||
contender_stats.flush_time, contender_stats.flush_time_per_shard)) | |||
lines.extend(self.report_total_count("merge count", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nicer if the output is grouped by action (merge, refresh, flush) instead of all times together and all counts together.
And group total counts directly below total times in all summary reports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating. LGTM
Use cumulative instead of total for summary reports and comparisons and clarify that min/median/max are across primary shards. Relates: elastic/elasticsearch#35594 (comment)
@danielmitterdorfer in 3d12203 I switched to the new "cumulative" terminology from "totals" and made the clarifications for "across primary shards" for min/median/max metrics; would be appreciated if you could take a quick look before merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think it's really clearer now (albeit the command line report becoming really large by now). LGTM!
Due to #608 it's likely we need to benchmark scenarios without using
the node-stats telemetry device. At the same time we want to get a
general idea of how many refreshes/merges/flushes happened (in total)
by accessing the index stats.
Add total count for merges/refresh/flush in summary output; this is
collected from
_all/primaries
in_stats
.Also modify summary description to clarify the values are totals from
primary shards.
Finally fix bug where index stats where time/count == 0 got skipped
from the summary.
Closes #614