-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adaptive Load Controller protos #398
Conversation
update from master
merge from upstream
merge from upstream
Signed-off-by: eric846 <[email protected]>
Signed-off-by: eric846 <[email protected]>
Signed-off-by: eric846 <[email protected]>
@oschaaf please assign to me once you are done reviewing this PR (after Eric implements the latest round of changes). |
Signed-off-by: eric846 <[email protected]>
Signed-off-by: eric846 <[email protected]>
Signed-off-by: eric846 <[email protected]>
Signed-off-by: eric846 <[email protected]>
Hi Otto, The step_controller_impl.proto configs now take a NighthawkFieldSelector (defined in step_controller.proto) enum input that tells the controller what single field within CommandLineOptions it should adjust dynamically. I included every numeric field in the enum that I didn't know to rule out. It's possible that some of the fields I included in NighthawkFieldSelector don't actually make sense and should be removed. Future numerical fields added to CommandLineOptions would need a corresponding value added to NighthawkFieldSelector. We will ship only basic single-variable StepControllers at first, but they will already work equally well for any field, not just RPS. Advanced multivariable optimization StepControllers could still use NighthawkFieldSelector fields to specify the multiple variables. Thanks! |
Signed-off-by: eric846 <[email protected]>
Signed-off-by: eric846 <[email protected]>
Signed-off-by: eric846 <[email protected]>
Signed-off-by: eric846 <[email protected]>
Signed-off-by: eric846 <[email protected]>
Updated again to remove some of the hard-coded stuff for setting multiple field and instead use plugins, which actually simplifies the design. |
Signed-off-by: eric846 <[email protected]>
…double Signed-off-by: eric846 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks great to me, just a couple of toughts and a possible proto3/validation nit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanations, LGTM!
@mum4k PTAL, simplified protos as discussed |
Signed-off-by: eric846 <[email protected]>
merge from upstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good, let's wait to see what Harvey thinks about this.
/retest |
🤷♀️ nothing to rebuild. |
…ent.Output turns out not to include the status Signed-off-by: eric846 <[email protected]>
070e404
to
677b783
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks great, a few questions/comments.
…plugin names, log thresholds only once per session Signed-off-by: eric846 <[email protected]>
Signed-off-by: eric846 <[email protected]>
@pamorgan please review and assign back to me when done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API LGTM modulo comment.
message MetricSpec { | ||
// Name of the metric to evaluate. For the set of built-in metric names, see | ||
// source/adaptive_load/metrics_plugin_impl.cc. Required. | ||
string metric_name = 1 [(validate.rules).string.min_len = 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do plugins and metric names relate? Does each plugin only export out a fixed number of metric names? Or is metric name an opaque ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each plugin will support its own fixed set of metric names. We will query each plugin for its metric names at startup in order to validate the MetricSpec in the adaptive load session input proto.
For nighthawk.builtin:
- latency-min
- latency-mean
- latency-mean-plus-1stdev
- latency-mean-plus-2stdev
- latency-mean-plus-3stdev
- latency-max
- achieved-rps
- attempted-rps
- send-rate
- success-rate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. I'm going to be OOTO next two weeks, so please go ahead with this set of protos. I think as long as we're the only consumers, which is highly likely for the near future, it's fine to iterate as needed.
Signed-off-by: eric846 <[email protected]>
Signed-off-by: eric846 <[email protected]>
fyi pending review by @pamorgan. Please assign to me once done. |
BenchmarkResult testing_stage_result = 3; | ||
// Metrics and thresholds that were used to determine load adjustments, as referenced in the | ||
// BenchmarkResults. | ||
repeated MetricSpecWithThreshold metric_thresholds = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be different than the AdaptiveLoadSessionSpec metric_thresholds. If not, then maybe this is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's copied directly from metric_thresholds in the input. I included it in the output to preserve the context when dumping the output proto to an archive. Otherwise the archive would only have the value and the score, but not the threshold. If somebody just archived the full input proto alongside the output proto, this field would be redundant.
// The duration of the single benchmark session of the testing stage to | ||
// confirm the performance at the level of load found in the adjusting stage. | ||
// Required. | ||
google.protobuf.Duration testing_stage_duration = 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a default reasonable value for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Service input spec. | ||
nighthawk.client.Output nighthawk_service_output = 1; | ||
// Status of this Nighthawk Service benchmark session. | ||
google.rpc.Status status = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add more comments on what the status should we get for special errors. For example what is the error if we never converge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified the comment on this field, which applies narrowly to failures to call the Nighthawk Service and internal errors returned by Nighthawk Service.
Also extended the comment in AdaptiveLoadSessionOutput where things like convergence status are recorded.
api/adaptive_load/metric_spec.proto
Outdated
string metric_name = 1 [(validate.rules).string.min_len = 1]; | ||
// Name of the MetricsPlugin providing the metric ("nighthawk.builtin" for built-in). | ||
// Required. | ||
string metrics_plugin_name = 2 [(validate.rules).string.min_len = 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add default for "nighthawk.builtin"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,21 @@ | |||
load("@envoy_api//bazel:api_build_system.bzl", "api_cc_py_proto_library") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some nits but LGTM
Signed-off-by: eric846 <[email protected]>
Signed-off-by: eric846 <[email protected]>
All protos, including plugin-specific config protos for all plugins we plan to ship initially.
Works on #416