-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drtprod: support for remote execution #140016
base: master
Are you sure you want to change the base?
Conversation
b7b9257
to
a9df52a
Compare
currently drtprod execute runs the execution of the YAML file locally. This is an issue in cases where we have long running deployments. So, to solve this problem, this change provides a new option to run the execution on a remote machine. This machine is created and all the dependent files are copied across to that machine. All executions are done in the remote machines. In addition, I have added another feature to define the dependent scripts. This becomes more important as the impact of failure in remote execution is more as compared to running locally. So, we have to ensure that the required files are present. Epic: None Release Note: None
a9df52a
to
2cf05dd
Compare
My understanding of the current implementation is as follows. Please correct me if there are any gaps: We are creating a YAML-like configuration for the remote executor in the code, then executing it to deploy the remote machine. After that, we copy all the relevant files to the remote machine. Finally, we create a systemd service to run the actual execution of drtprod for cluster provisioning using the following command: This approach adds a lot of unnecessary code, such as preparing the configuration for the remote deployer in the code, which is hardcoded and reduces the flexibility of choosing another cloud, machine type, etc. Additionally, the added complexity makes it harder to reason about the code. We can simplify the usage by leveraging existing code with some enhancements in the YAML configuration file. We can have options like
Sample YAML file for this approach: # YAML for creating and configuring the drt-chaos and workload-chaos clusters. This also configures Datadog.
environment:
ROACHPROD_GCE_DEFAULT_SERVICE_ACCOUNT: [email protected]
ROACHPROD_DNS: drt.crdb.io
ROACHPROD_GCE_DNS_DOMAIN: drt.crdb.io
ROACHPROD_GCE_DNS_ZONE: drt
ROACHPROD_GCE_DEFAULT_PROJECT: cockroach-drt
CLUSTER: drt-cluster-monitor
targets:
- target_name: $CLUSTER
is_remote_execution: true
yamlFile: pkg/cmd/drtprod/configs/drt_chaos.yaml
steps:
- command: create
args:
- $CLUSTER
flags:
clouds: gce
gce-zones: "us-central1-a"
nodes: 1
gce-machine-type: n2-standard-2
username: drt
lifetime: 8760h
gce-image: "ubuntu-2204-jammy-v20240319"
on_rollback:
- command: destroy
args:
- $CLUSTER
- command: sync
flags:
clouds: gce
dependent_file_locations:
- artifacts/roachprod
- artifacts/roachtest
- pkg/cmd/drtprod/scripts/setup_dmsetup_disk_staller
- pkg/cmd/drtprod/scripts/setup_datadog_cluster
- pkg/cmd/drtprod/scripts/setup_datadog_workload
- pkg/cmd/drtprod/scripts/tpcc_init.sh
- pkg/cmd/drtprod/scripts/generate_tpcc_run.sh
- pkg/cmd/drtprod/scripts/generate_kv_run.sh
- pkg/cmd/drtprod/scripts/generate_tpcc_drop.sh |
cobraCmd.Flags().BoolVarP(&useRemoteDeployer, | ||
"remote", "r", false, "runs the deployment remotely in the deployment node. This is "+ | ||
"useful for long running deployments as the local execution completes soon. A \"deployment_name\" must be mentioned "+ | ||
"in the YAML for running a deployment remotely.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we have a more precise description of the flag something like below
This flag enables the deployment of the drtprod cluster from a remote machine. It is useful for long-running deployments, as the local execution completes quickly. Ensure that a "deployment_name" is specified in the YAML file to enable remote deployment.
|
||
// GetYamlProcessor creates a new Cobra command for processing a YAML file. | ||
// The command expects a YAML file as an argument and runs the commands defined in it. | ||
func GetYamlProcessor(ctx context.Context) *cobra.Command { | ||
displayOnly := false | ||
useRemoteDeployer := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename variable.
useRemoteDeployment
if err != nil { | ||
// drtprod is needed in the path to run yaml commands | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we wrap the error to give more context to the user on failure.
return errors.Wrap(err, "drtprod executable is required in the path to run yaml commands")
Targets []target `yaml:"targets"` // List of target clusters with their steps | ||
DependentFileLocations []string `yaml:"dependent_file_locations,omitempty"` // location of all the dependent files - scripts/binaries | ||
DeploymentName string `yaml:"deployment_name,omitempty"` // name provided to the deployment | ||
Environment map[string]string `yaml:"environment"` // Environment variables to set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add more context around DeploymentName
in comment.
// either locally or in a remote VM based on the flag "useRemoteDeployer". | ||
// If the useRemoteDeployer is set to true, a new VM is created and the current YAML is copied across along | ||
// with the required scripts and binaries and execution is run there. This is useful for long-running deployments | ||
// where the local deployment just delegates the execution of the deployment to the remote machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: throughout can we follow this best practice for code commenting.
Use full sentences. Capital letter at begin, period at the end, with at least a subject and a verb.
if _, err := os.Stat(drtprodLocation); err != nil { | ||
// if drtprod binary is not available in artifacts, we cannot proceed as this is needed for executing | ||
// the YAML remotely | ||
return fmt.Errorf("%s must be available for executing remotely", drtprodLocation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use erros.Wrapf
currently drtprod execute runs the execution of the YAML file locally. This is an issue in cases where we have long running deployments. So, to solve this problem, this change provides a new option to run the execution on a remote machine. This machine is created and all the dependent files are copied across to that machine. All executions are done in the remote machines.
In addition, I have added another feature to define the dependent scripts. This becomes more important as the impact of failure in remote execution is more as compared to running locally. So, we have to ensure that the required files are present.
Epic: None
Release Note: None