-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
work in progress - remote run #3986
Conversation
def remote_run(args) -> int: | ||
"""Run stuff, but remotely!""" | ||
system_paasta_config = load_system_paasta_config( | ||
"/nail/home/qlo/paasta_config/paasta/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using this to override the API endpoint
paasta_tools/paasta_remote_run_2.py
Outdated
def wait_until_deployment_gone(kube_client, namespace, deployment_name): | ||
for retry in range(10): | ||
pod = find_pod(kube_client, namespace, deployment_name, 1) | ||
if not pod: | ||
return | ||
sleep(5) | ||
raise Exception("Pod still exists!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work. The request times out before the deployment terminates. I think we'd need to return and the client needs to wait and retry.
paasta_tools/paasta_remote_run_2.py
Outdated
|
||
# Create the app with a new name | ||
formatted_application = deployment.format_kubernetes_app() | ||
formatted_application.metadata.name += f"-remote-run-{user}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's the risk of the pod name being truncated, so we'll definitely also want to indicators of this being a remote-run in the pod's labels
paasta_tools/kubernetes_tools.py
Outdated
selector=V1LabelSelector( | ||
match_labels={ | ||
"paasta.yelp.com/service": self.get_service(), | ||
"paasta.yelp.com/instance": self.get_instance(), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: This was copied over from the Deployment function below and is causing errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm, yeah, I don't see why you would need a select at this stage
kind="Job", | ||
metadata=self.get_kubernetes_metadata(git_sha), | ||
spec=V1JobSpec( | ||
active_deadline_seconds=3600, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we'll need to propagate this somehow from a parameter
paasta_tools/kubernetes_tools.py
Outdated
selector=V1LabelSelector( | ||
match_labels={ | ||
"paasta.yelp.com/service": self.get_service(), | ||
"paasta.yelp.com/instance": self.get_instance(), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm, yeah, I don't see why you would need a select at this stage
paasta_tools/kubernetes_tools.py
Outdated
# DO NOT ADD LABELS AFTER THIS LINE | ||
config_hash = get_config_hash( | ||
self.sanitize_for_config_hash(complete_config), | ||
force_bounce=self.get_force_bounce(), | ||
) | ||
complete_config.metadata.labels["yelp.com/paasta_config_sha"] = config_hash | ||
complete_config.metadata.labels["paasta.yelp.com/config_sha"] = config_hash | ||
|
||
complete_config.spec.template.metadata.labels[ | ||
"yelp.com/paasta_config_sha" | ||
] = config_hash | ||
complete_config.spec.template.metadata.labels[ | ||
"paasta.yelp.com/config_sha" | ||
] = config_hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this stuff is very related to deployment bounces, so I'd say we don't need it in this case
paasta_tools/kubernetes_tools.py
Outdated
audience = "remote-run-" + user | ||
service_account = "default" | ||
token_spec = V1TokenRequestSpec( | ||
expiration_seconds=600, audiences=[audience] # minimum allowed by k8s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The audience will need to be the kubernetes one, otherwise it won't be usable for doing execs. You should be able to just leave it unset to achieve that.
paasta_tools/kubernetes_tools.py
Outdated
def create_temp_exec_token(kube_client: KubeClient, namespace: str, user: str): | ||
"""Create a short lived token for exec""" | ||
audience = "remote-run-" + user | ||
service_account = "default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to come from the main paasta config. We'll have to create a user that can just exec in to places.
paasta_tools/kubernetes_tools.py
Outdated
@@ -4354,6 +4452,19 @@ def get_kubernetes_secret_env_variables( | |||
return decrypted_secrets | |||
|
|||
|
|||
def create_temp_exec_token(kube_client: KubeClient, namespace: str, user: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user
parameter is not used here. Is the plan to include it in the service account name if we go for creating ephemeral service accounts?
This has been fixed and refactored in more sensible chunks |
This is the largest portion of #3986, implementing the new API for remote-run (look at the second commit, the first is mostly codegen stuff). I deviated a bit from the original design poc code, as I saw that was waiting for the remote-run pods to become available as part of the `remote_run/.../start` endpoint, and that's not a great idea as it would just keep one API worker busy doing nothing. So I modified that endpoint to return as soon as the job resource is created, and then added another one to allow the CLI client to poll for updates until the pod becomes available. So in summary the logic is the following: * remote-run/start * load all the necessary config for a service instance, create a job resource, and launch it * return name of the job * remote-run/poll * look if a pod for a remote-run job is ready * remote-run/token * get temporary credentials to exec into the remote-run pod * remote-run/stop * explicitly end the remote-run job (which will eventually be anyway since we set a deadline) I also refused to add more stuff to the `kubernetes_tools` module as that already has **4500+ lines in it**, which is too much for me not to get triggered about, so I grouped all the new methods needed in a new module under `paasta_tools.kubernetes`
This was meant to be included in the last release but I (luisp) don't know how to read and merged things in the "wrong" order. This correctly merges in the following PRs: * cleanup for remote-run resources (#4024) Clean up logic for the (meant to be) ephemeral resources which are created in remote-run invocations (#4022). * new remote-run cli (#4025) Last portion of #3986, with the updated logic to reflect the updated API conventions introduced in #4022.
Work in progress, not fully functional!
Create a new API, /service/{service}/{instance}/remote_run
Currently accepts additional a username, whether it's interactive, and whether it should recreate the deployment
Launches a deployment with
-remote-run-{username}
. If interactive, replaces command with sleep. If deployment exists, maybe terminate it and relaunch. Returns the name of the pod and the namespace.Created CLI cmd to interact with this api. Gives you a kubectl command to run to get a shell in the container.