Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive Load FakeStepController doom update #492

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
8ea442d
Merge pull request #5 from envoyproxy/master
eric846 Jun 1, 2020
5ac755a
Merge pull request #6 from envoyproxy/master
eric846 Jun 28, 2020
b8c25a5
Merge pull request #7 from envoyproxy/master
eric846 Jul 7, 2020
1c19c68
initial commit
eric846 Jul 9, 2020
7050686
fix comments
eric846 Jul 9, 2020
0776563
fix format
eric846 Jul 9, 2020
16fd8f6
rename adaptive_rps to adaptive_load
eric846 Jul 10, 2020
c383010
add field_selector in example
eric846 Jul 10, 2020
6e1a483
fix example comment
eric846 Jul 10, 2020
4ef1140
fix format
eric846 Jul 10, 2020
4111bf4
add support for fault injection headers
eric846 Jul 10, 2020
871a959
replace linear and binary search with exponential search
eric846 Jul 10, 2020
1fd77c1
add InputVariableSetter mechanism
eric846 Jul 11, 2020
edc36b2
add input variable setter to build file
eric846 Jul 11, 2020
4d0364e
fix syntax errors
eric846 Jul 11, 2020
aed6d94
rename samples/adaptive_rps
eric846 Jul 11, 2020
d9ae87d
improve comments, change step controller initial value from int64 to …
eric846 Jul 12, 2020
a05a6f5
add proto validation rules, fix comments, make rps the default input_…
eric846 Jul 13, 2020
8cd4d21
fix comment wording
eric846 Jul 13, 2020
d814a96
simplify protos, add defaults, specify required or optional
eric846 Jul 14, 2020
5f5a885
add missing newline
eric846 Jul 14, 2020
7e20a78
Kick CI
eric846 Jul 14, 2020
9048267
simplify protos
eric846 Jul 15, 2020
306c0ec
fix format
eric846 Jul 15, 2020
d33f543
fix some optional field comments and rules
eric846 Jul 15, 2020
442cca9
Merge pull request #10 from envoyproxy/master
eric846 Jul 16, 2020
677b783
add Nighthawk status field in BenchmarkResult as nested nighthawk.cli…
eric846 Jul 19, 2020
cefb366
switch to standard Envoy plugin config proto, add prefix to internal …
eric846 Jul 22, 2020
f3684df
Merge remote-tracking branch 'upstream/master' into adaptive-rps-protos2
eric846 Jul 22, 2020
5463051
create headers
eric846 Jul 22, 2020
46e0e25
fix format
eric846 Jul 22, 2020
f634642
use docstring format
eric846 Jul 22, 2020
3c39faa
fix typos in comments
eric846 Jul 23, 2020
b9c8f2b
split build target, get rid of ostream, change InputValueSetter to us…
eric846 Jul 24, 2020
5fc4db4
remove nested namespace, remove redundant _include in target names
eric846 Jul 26, 2020
64e7852
merge from upstream
eric846 Jul 29, 2020
12807f1
Merge remote-tracking branch 'upstream/master' into adaptive-rps-headers
eric846 Jul 29, 2020
e8e960f
merge from upstream
eric846 Aug 27, 2020
3d97c2f
update FakeStepController to set doom from negative metric scores pas…
eric846 Aug 27, 2020
6306b4e
Merge remote-tracking branch 'upstream/master' into master2
eric846 Aug 27, 2020
4525923
merge from upstream
eric846 Aug 27, 2020
1ece783
Merge remote-tracking branch 'upstream/master' into master2
eric846 Aug 28, 2020
0a9c0a5
Merge branch 'master2' into adaptive-rps-fake-step-controller-doom-up…
eric846 Aug 28, 2020
59a8cc2
fix merge conflict
eric846 Aug 28, 2020
049baed
add missing const, improve comments
eric846 Aug 28, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,10 @@ absl::Status StatusFromProtoRpcStatus(const google::rpc::Status& status_proto) {
} // namespace

FakeStepController::FakeStepController(
const nighthawk::adaptive_load::FakeStepControllerConfig& config,
nighthawk::adaptive_load::FakeStepControllerConfig config,
nighthawk::client::CommandLineOptions command_line_options_template)
: is_converged_{false}, is_doomed_{false}, fixed_rps_value_{config.fixed_rps_value()},
: input_setting_failure_countdown_{config.artificial_input_setting_failure_countdown()},
config_{std::move(config)}, is_converged_{false}, is_doomed_{false},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a complaint, but what is std::move accomplishing here, given that this isn't a smart pointer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny story, this was actually required by clang-tidy. If the function implementation is making a copy of a const reference parameter, clang-tidy tells you to change the parameter to be pass by value so it's obvious to the caller that a copy is happening. The argument gets copied at the call site into an unnamed temporary value. Then in the constructor we can actually move this temporary value into the field without incurring the cost of a second copy.

command_line_options_template_{std::move(command_line_options_template)} {}

bool FakeStepController::IsConverged() const { return is_converged_; }
Expand All @@ -33,20 +34,31 @@ bool FakeStepController::IsDoomed(std::string& doomed_reason) const {

absl::StatusOr<nighthawk::client::CommandLineOptions>
FakeStepController::GetCurrentCommandLineOptions() const {
if (config_.has_artificial_input_setting_failure() && input_setting_failure_countdown_ <= 0) {
return StatusFromProtoRpcStatus(config_.artificial_input_setting_failure());
}
nighthawk::client::CommandLineOptions options = command_line_options_template_;
options.mutable_requests_per_second()->set_value(fixed_rps_value_);
options.mutable_requests_per_second()->set_value(config_.fixed_rps_value());
return options;
}

void FakeStepController::UpdateAndRecompute(
const nighthawk::adaptive_load::BenchmarkResult& benchmark_result) {
if (input_setting_failure_countdown_ > 0) {
--input_setting_failure_countdown_;
}
// "Convergence" is defined as the latest benchmark reporting any score > 0.0.
// "Doom" is defined as any score < 0.0. Neutral is all scores equal to 0.0.
is_converged_ = false;
is_doomed_ = false;
doomed_reason_ = "";
for (const nighthawk::adaptive_load::MetricEvaluation& metric_evaluation :
benchmark_result.metric_evaluations()) {
if (metric_evaluation.threshold_score() > 0.0) {
if (metric_evaluation.threshold_score() < 0.0) {
is_doomed_ = true;
doomed_reason_ = "artificial doom triggered by negative score";
} else if (metric_evaluation.threshold_score() > 0.0) {
is_converged_ = true;
break;
}
}
}
Expand Down Expand Up @@ -110,4 +122,20 @@ envoy::config::core::v3::TypedExtensionConfig MakeFakeStepControllerPluginConfig
return outer_config;
}

envoy::config::core::v3::TypedExtensionConfig
MakeFakeStepControllerPluginConfigWithInputSettingError(
int fixed_rps_value, const absl::Status& artificial_input_setting_failure, int countdown) {
envoy::config::core::v3::TypedExtensionConfig outer_config;
outer_config.set_name("nighthawk.fake_step_controller");
nighthawk::adaptive_load::FakeStepControllerConfig config;
config.set_fixed_rps_value(fixed_rps_value);
config.mutable_artificial_input_setting_failure()->set_code(
static_cast<int>(artificial_input_setting_failure.code()));
config.mutable_artificial_input_setting_failure()->set_message(
std::string(artificial_input_setting_failure.message()));
config.set_artificial_input_setting_failure_countdown(countdown);
outer_config.mutable_typed_config()->PackFrom(config);
return outer_config;
}

} // namespace Nighthawk
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ namespace Nighthawk {

/**
* StepController for testing: Configurable convergence and doom countdowns, fixed RPS value.
*
* This class is not thread-safe.
*/
class FakeStepController : public StepController {
public:
Expand All @@ -22,7 +24,7 @@ class FakeStepController : public StepController {
* @param config FakeStepControllerConfig proto for setting the fixed RPS value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(pre-existing, optional) We should probably mention in the class comment that this class isn't thread-safe. At least I am assuming it isn't meant to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @param command_line_options_template A template for producing Nighthawk input.
*/
FakeStepController(const nighthawk::adaptive_load::FakeStepControllerConfig& config,
FakeStepController(nighthawk::adaptive_load::FakeStepControllerConfig config,
nighthawk::client::CommandLineOptions command_line_options_template);
/**
* @return bool The current value of |is_converged_|.
Expand All @@ -42,20 +44,23 @@ class FakeStepController : public StepController {
absl::StatusOr<nighthawk::client::CommandLineOptions>
GetCurrentCommandLineOptions() const override;
/**
* Updates |is_converged_| to reflect whether |benchmark_result| contains any score >0. Sets
* |is_doomed_| based whether the status in |benchmark_result| is OK; copies the status message
* into |doomed_reason_| only when the status is not OK.
* Updates |is_converged_| to reflect whether |benchmark_result| contains any score >0. Updates
* |is_doomed_| to reflect whether |benchmark_result| contains any score <0. A non-converged,
* non-doomed input has scores all equal to 0.
*
* @param benchmark_result A Nighthawk benchmark result proto.
*/
void
UpdateAndRecompute(const nighthawk::adaptive_load::BenchmarkResult& benchmark_result) override;

private:
// Counts down UpdateAndRecompute() calls. When this reaches zero, GetCurrentCommandLineOptions()
// starts to return an artificial input value setting failure if one is specified in the config.
int input_setting_failure_countdown_;
const nighthawk::adaptive_load::FakeStepControllerConfig config_;
bool is_converged_;
bool is_doomed_;
std::string doomed_reason_;
const int fixed_rps_value_;
const nighthawk::client::CommandLineOptions command_line_options_template_;
};

Expand Down Expand Up @@ -91,7 +96,8 @@ MakeFakeStepControllerPluginConfig(int fixed_rps_value);
* Creates a valid TypedExtensionConfig proto that activates a FakeStepController with a
* FakeInputVariableSetterConfig that fails validation.
*
* @param artificial_validation_error An error status.
* @param artificial_validation_error An artificial error status to be returned by
* FakeStepControllerConfigFactory::ValidateConfig() when attempting LoadStepControllerPlugin().
*
* @return TypedExtensionConfig A proto that activates FakeStepController by name and includes
* a FakeStepControllerConfig proto wrapped in an Any. This proto will fail validation when
Expand All @@ -100,4 +106,21 @@ MakeFakeStepControllerPluginConfig(int fixed_rps_value);
envoy::config::core::v3::TypedExtensionConfig MakeFakeStepControllerPluginConfigWithValidationError(
const absl::Status& artificial_validation_error);

/**
* Creates a valid TypedExtensionConfig proto that activates a FakeStepController with a
* FakeInputVariableSetterConfig that returns an error from GetCurrentCommandLineOptions().
*
* @param fixed_rps_value Value for RPS to set in the FakeStepControllerConfig proto until the
* countdown reaches zero.
* @param artificial_input_setting_failure An error status.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expand this comment, explaining what is the meaning of the error status, i.e. what it is used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @param countdown Number of times UpdateAndRecompute() must be called before
* GetCurrentCommandLineOptions() starts to return the input error status.
*
* @return TypedExtensionConfig A proto that activates FakeStepController by name and includes
* a FakeStepControllerConfig proto wrapped in an Any.
*/
envoy::config::core::v3::TypedExtensionConfig
MakeFakeStepControllerPluginConfigWithInputSettingError(
int fixed_rps_value, const absl::Status& artificial_input_setting_failure, int countdown);

} // namespace Nighthawk
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,25 @@ package nighthawk.adaptive_load;
import "envoy/config/core/v3/extension.proto";
import "google/rpc/status.proto";

// Configuration for FakeStepController (plugin name: "nighthawk.fake_step_controller") that always
// returns a fixed RPS value and changes converged and doomed states based on the latest reported
// BenchmarkResult.
// Configuration for FakeStepController (plugin name: "nighthawk.fake_step_controller") that returns
// a fixed RPS value and changes converged and doomed states based on the latest reported
// BenchmarkResult. Can also be programmed to return a proto validation failure, return an error
// from input value setting every time, or return an error after some number of UpdateAndRecompute()
// iterations.
message FakeStepControllerConfig {
// RPS that should always be returned. Optional, default 0.
// RPS that should always be returned, except when artificial errors are configured. Optional,
// default 0.
int32 fixed_rps_value = 1;
// Artificial error that the plugin factory should return during validation. Optional.
google.rpc.Status artificial_validation_failure = 2;
// Artificial error that should be returned from GetCurrentCommandLineOptions(). Optional. May be
// used in conjunction with |artificial_input_setting_failure_countdown| to activate error
// behavior after a delay.
google.rpc.Status artificial_input_setting_failure = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm having trouble understanding the full use case here. If this is 3, then we are going to succeed 3 times, then fail on the 4th attempt. Makes sense, but I'm not sure I understand why that's useful. What is the test that you're supporting by creating it in this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetCurrentCommandLine() can return an error status that should be handled cleanly by the main controller loop. The controller calls GetCurrentCommandLine() repeatedly during the adjusting stage until the step controller says it's converged, and then in the testing stage we call GetCurrentCommandLine() one last time, reusing the last converged value.

In order to test handling of these errors, we need the FakeStepController to return successfully from GetCurrentCommandLine() during the adjusting stage, but then start returning errors just in time for the testing stage. We don't have any way to update the FakeStepController during the run, so it has to somehow be programmed up front to behave differently at different times.

An alternative would be for magic values in UpdateAndRecompute() to trigger GetCurrentCommandLine() error behavior. We already use magic values to control convergence and doom. But there's only so much information we can encode in metric score doubles without having it get out of hand. Currently the UpdateAndRecompute() behavior is: zero scores=non-converged non-doomed, any positive score=converged, any negative score=doomed.

This trick wouldn't be necessary if the step controller were aware of what stage it was operating in.

// Relevant only when |artificial_input_setting_failure| is set. Number of calls to
// UpdateAndRecompute() the controller must receive before it starts to return
// |artificial_input_setting_failure|. Before this total is reached, |fixed_rps_value| is
// returned. Optional, default 0, meaning the failure is returned regardless of calls to
// UpdateAndRecompute().
int32 artificial_input_setting_failure_countdown = 4;
}
Original file line number Diff line number Diff line change
Expand Up @@ -111,19 +111,66 @@ TEST(FakeStepController, GetCurrentCommandLineOptionsReturnsRpsFromConfig) {
kExpectedValue);
}

TEST(FakeStepController, GetCurrentCommandLineOptionsReturnsArtificialErrorImmediately) {
FakeStepControllerConfig config;
const int kExpectedCode = ::grpc::DEADLINE_EXCEEDED;
const std::string kExpectedMessage = "artificial input setting error";
config.mutable_artificial_input_setting_failure()->set_code(kExpectedCode);
config.mutable_artificial_input_setting_failure()->set_message(kExpectedMessage);
// Not setting countdown.

FakeStepController step_controller(config, CommandLineOptions());
absl::StatusOr<nighthawk::client::CommandLineOptions> command_line_options_or =
step_controller.GetCurrentCommandLineOptions();
ASSERT_FALSE(command_line_options_or.ok());
EXPECT_EQ(static_cast<int>(command_line_options_or.status().code()), kExpectedCode);
EXPECT_EQ(command_line_options_or.status().message(), kExpectedMessage);
}

TEST(FakeStepController, GetCurrentCommandLineOptionsReturnsArtificialErrorAfterCountdown) {
FakeStepControllerConfig config;
const int kExpectedCode = ::grpc::DEADLINE_EXCEEDED;
const std::string kExpectedMessage = "artificial input setting error";
config.mutable_artificial_input_setting_failure()->set_code(kExpectedCode);
config.mutable_artificial_input_setting_failure()->set_message(kExpectedMessage);
config.set_artificial_input_setting_failure_countdown(2);

FakeStepController step_controller(config, CommandLineOptions());
absl::StatusOr<nighthawk::client::CommandLineOptions> command_line_options_or1 =
step_controller.GetCurrentCommandLineOptions();
EXPECT_TRUE(command_line_options_or1.ok());

step_controller.UpdateAndRecompute(nighthawk::adaptive_load::BenchmarkResult());
// Countdown should now be 1.

absl::StatusOr<nighthawk::client::CommandLineOptions> command_line_options_or2 =
step_controller.GetCurrentCommandLineOptions();
EXPECT_TRUE(command_line_options_or2.ok());

step_controller.UpdateAndRecompute(nighthawk::adaptive_load::BenchmarkResult());
// Countdown should now have reached 0.

// This should now return the artificial input setting failure:
absl::StatusOr<nighthawk::client::CommandLineOptions> command_line_options_or3 =
step_controller.GetCurrentCommandLineOptions();
ASSERT_FALSE(command_line_options_or3.ok());
EXPECT_EQ(static_cast<int>(command_line_options_or3.status().code()), kExpectedCode);
EXPECT_EQ(command_line_options_or3.status().message(), kExpectedMessage);
}

TEST(FakeStepController, IsConvergedInitiallyReturnsFalse) {
FakeStepController step_controller(FakeStepControllerConfig{}, CommandLineOptions{});
EXPECT_FALSE(step_controller.IsConverged());
}

TEST(FakeStepController, IsConvergedReturnsFalseAfterBenchmarkResultWithoutPositiveScore) {
TEST(FakeStepController, IsConvergedReturnsFalseAfterNeutralBenchmarkResult) {
FakeStepController step_controller(FakeStepControllerConfig{}, CommandLineOptions{});
BenchmarkResult benchmark_result;
step_controller.UpdateAndRecompute(benchmark_result);
EXPECT_FALSE(step_controller.IsConverged());
}

TEST(FakeStepController, IsConvergedReturnsTrueAfterBenchmarkResultWithPositiveScore) {
TEST(FakeStepController, IsConvergedReturnsTrueAfterPositiveBenchmarkResultScore) {
FakeStepController step_controller(FakeStepControllerConfig{}, CommandLineOptions{});
BenchmarkResult benchmark_result;
MetricEvaluation* evaluation = benchmark_result.mutable_metric_evaluations()->Add();
Expand All @@ -132,6 +179,35 @@ TEST(FakeStepController, IsConvergedReturnsTrueAfterBenchmarkResultWithPositiveS
EXPECT_TRUE(step_controller.IsConverged());
}

TEST(FakeStepController, IsDoomedReturnsFalseAfterNeutralBenchmarkResult) {
FakeStepController step_controller(FakeStepControllerConfig{}, CommandLineOptions{});
BenchmarkResult benchmark_result;
step_controller.UpdateAndRecompute(benchmark_result);
std::string doomed_reason;
EXPECT_FALSE(step_controller.IsDoomed(doomed_reason));
}

TEST(FakeStepController,
IsDoomedReturnsFalseAndLeavesDoomedReasonUntouchedAfterNeutralBenchmarkResult) {
FakeStepController step_controller(FakeStepControllerConfig{}, CommandLineOptions{});
BenchmarkResult benchmark_result;
step_controller.UpdateAndRecompute(benchmark_result);
std::string variable_that_should_not_be_written = "original value";
EXPECT_FALSE(step_controller.IsDoomed(variable_that_should_not_be_written));
EXPECT_EQ(variable_that_should_not_be_written, "original value");
}

TEST(FakeStepController, IsDoomedReturnsTrueAndSetsDoomedReasonAfterNegativeBenchmarkResultScore) {
FakeStepController step_controller(FakeStepControllerConfig{}, CommandLineOptions{});
BenchmarkResult benchmark_result;
MetricEvaluation* evaluation = benchmark_result.mutable_metric_evaluations()->Add();
evaluation->set_threshold_score(-1.0);
step_controller.UpdateAndRecompute(benchmark_result);
std::string doomed_reason;
EXPECT_TRUE(step_controller.IsDoomed(doomed_reason));
EXPECT_EQ(doomed_reason, "artificial doom triggered by negative score");
}

TEST(MakeFakeStepControllerPluginConfig, ActivatesFakeStepControllerPlugin) {
absl::StatusOr<StepControllerPtr> plugin_or = LoadStepControllerPlugin(
MakeFakeStepControllerPluginConfig(0), nighthawk::client::CommandLineOptions{});
Expand All @@ -153,7 +229,7 @@ TEST(MakeFakeStepControllerPluginConfig, ProducesFakeStepControllerPluginWithCon
}

TEST(MakeFakeStepControllerPluginConfigWithValidationError,
ProducesFakeStepControllerPluginWithConfiguredValue) {
ProducesFakeStepControllerPluginWithConfiguredError) {
std::string kValidationErrorMessage = "artificial validation error";
absl::StatusOr<StepControllerPtr> plugin_or =
LoadStepControllerPlugin(MakeFakeStepControllerPluginConfigWithValidationError(
Expand All @@ -163,5 +239,29 @@ TEST(MakeFakeStepControllerPluginConfigWithValidationError,
EXPECT_EQ(plugin_or.status().message(), kValidationErrorMessage);
}

TEST(MakeFakeStepControllerPluginConfigWithInputSettingError,
ProducesFakeStepControllerPluginWithConfiguredErrorAndCountdown) {
const int kExpectedRpsValue = 123;
const std::string kInputSettingErrorMessage = "artificial input setting error";
absl::StatusOr<StepControllerPtr> plugin_or = LoadStepControllerPlugin(
MakeFakeStepControllerPluginConfigWithInputSettingError(
kExpectedRpsValue, absl::DeadlineExceededError(kInputSettingErrorMessage),
/*countdown=*/1),
nighthawk::client::CommandLineOptions{});
ASSERT_TRUE(plugin_or.ok());
auto* plugin = dynamic_cast<FakeStepController*>(plugin_or.value().get());
ASSERT_NE(plugin, nullptr);
absl::StatusOr<nighthawk::client::CommandLineOptions> command_line_options_or1 =
plugin->GetCurrentCommandLineOptions();
ASSERT_TRUE(command_line_options_or1.ok());
EXPECT_EQ(command_line_options_or1.value().requests_per_second().value(), kExpectedRpsValue);
plugin->UpdateAndRecompute(BenchmarkResult());
absl::StatusOr<nighthawk::client::CommandLineOptions> command_line_options_or2 =
plugin->GetCurrentCommandLineOptions();
ASSERT_FALSE(command_line_options_or2.ok());
EXPECT_EQ(command_line_options_or2.status().code(), absl::StatusCode::kDeadlineExceeded);
EXPECT_EQ(command_line_options_or2.status().message(), kInputSettingErrorMessage);
}

} // namespace
} // namespace Nighthawk