work in progress - remote run #3986

Qmando · 2024-11-02T00:31:24Z

Work in progress, not fully functional!

Create a new API, /service/{service}/{instance}/remote_run

Currently accepts additional a username, whether it's interactive, and whether it should recreate the deployment

Launches a deployment with -remote-run-{username}. If interactive, replaces command with sleep. If deployment exists, maybe terminate it and relaunch. Returns the name of the pod and the namespace.

Created CLI cmd to interact with this api. Gives you a kubectl command to run to get a shell in the container.

paasta_tools/api/api_docs/oapi.yaml

Qmando · 2024-11-02T00:33:52Z

paasta_tools/cli/cmds/remote_run_2.py

+def remote_run(args) -> int:
+    """Run stuff, but remotely!"""
+    system_paasta_config = load_system_paasta_config(
+        "/nail/home/qlo/paasta_config/paasta/"


I was using this to override the API endpoint

Qmando · 2024-11-02T00:34:55Z

paasta_tools/paasta_remote_run_2.py

+def wait_until_deployment_gone(kube_client, namespace, deployment_name):
+    for retry in range(10):
+        pod = find_pod(kube_client, namespace, deployment_name, 1)
+        if not pod:
+            return
+        sleep(5)
+    raise Exception("Pod still exists!")


This doesn't work. The request times out before the deployment terminates. I think we'd need to return and the client needs to wait and retry.

paasta_tools/cli/cmds/remote_run_2.py

paasta_tools/api/api.py

paasta_tools/cli/cmds/remote_run_2.py

piax93 · 2024-11-04T11:48:51Z

paasta_tools/paasta_remote_run_2.py

+
+    # Create the app with a new name
+    formatted_application = deployment.format_kubernetes_app()
+    formatted_application.metadata.name += f"-remote-run-{user}"


there's the risk of the pod name being truncated, so we'll definitely also want to indicators of this being a remote-run in the pod's labels

paasta_tools/paasta_remote_run_2.py

Qmando · 2024-11-06T03:56:55Z

paasta_tools/kubernetes_tools.py

+                    selector=V1LabelSelector(
+                        match_labels={
+                            "paasta.yelp.com/service": self.get_service(),
+                            "paasta.yelp.com/instance": self.get_instance(),
+                        }


todo: This was copied over from the Deployment function below and is causing errors.

mmm, yeah, I don't see why you would need a select at this stage

paasta_tools/api/api.py

paasta_tools/kubernetes/application/controller_wrappers.py

piax93 · 2024-11-06T14:50:00Z

paasta_tools/kubernetes_tools.py

+                kind="Job",
+                metadata=self.get_kubernetes_metadata(git_sha),
+                spec=V1JobSpec(
+                    active_deadline_seconds=3600,


I guess we'll need to propagate this somehow from a parameter

piax93 · 2024-11-06T14:51:11Z

paasta_tools/kubernetes_tools.py

+                    selector=V1LabelSelector(
+                        match_labels={
+                            "paasta.yelp.com/service": self.get_service(),
+                            "paasta.yelp.com/instance": self.get_instance(),
+                        }


mmm, yeah, I don't see why you would need a select at this stage

piax93 · 2024-11-06T14:52:10Z

paasta_tools/kubernetes_tools.py

+            # DO NOT ADD LABELS AFTER THIS LINE
+            config_hash = get_config_hash(
+                self.sanitize_for_config_hash(complete_config),
+                force_bounce=self.get_force_bounce(),
+            )
+            complete_config.metadata.labels["yelp.com/paasta_config_sha"] = config_hash
+            complete_config.metadata.labels["paasta.yelp.com/config_sha"] = config_hash
+
+            complete_config.spec.template.metadata.labels[
+                "yelp.com/paasta_config_sha"
+            ] = config_hash
+            complete_config.spec.template.metadata.labels[
+                "paasta.yelp.com/config_sha"
+            ] = config_hash


this stuff is very related to deployment bounces, so I'd say we don't need it in this case

piax93 · 2024-11-06T14:53:54Z

paasta_tools/kubernetes_tools.py

+    audience = "remote-run-" + user
+    service_account = "default"
+    token_spec = V1TokenRequestSpec(
+        expiration_seconds=600, audiences=[audience]  # minimum allowed by k8s


The audience will need to be the kubernetes one, otherwise it won't be usable for doing execs. You should be able to just leave it unset to achieve that.

piax93 · 2024-11-06T14:54:41Z

paasta_tools/kubernetes_tools.py

+def create_temp_exec_token(kube_client: KubeClient, namespace: str, user: str):
+    """Create a short lived token for exec"""
+    audience = "remote-run-" + user
+    service_account = "default"


This will need to come from the main paasta config. We'll have to create a user that can just exec in to places.

piax93 · 2024-11-11T15:39:17Z

paasta_tools/kubernetes_tools.py

@@ -4354,6 +4452,19 @@ def get_kubernetes_secret_env_variables(
    return decrypted_secrets


+def create_temp_exec_token(kube_client: KubeClient, namespace: str, user: str):


The user parameter is not used here. Is the plan to include it in the service account name if we go for creating ephemeral service accounts?

piax93 · 2025-03-19T16:07:14Z

This has been fixed and refactored in more sensible chunks

This is the largest portion of #3986, implementing the new API for remote-run (look at the second commit, the first is mostly codegen stuff). I deviated a bit from the original design poc code, as I saw that was waiting for the remote-run pods to become available as part of the `remote_run/.../start` endpoint, and that's not a great idea as it would just keep one API worker busy doing nothing. So I modified that endpoint to return as soon as the job resource is created, and then added another one to allow the CLI client to poll for updates until the pod becomes available. So in summary the logic is the following: * remote-run/start * load all the necessary config for a service instance, create a job resource, and launch it * return name of the job * remote-run/poll * look if a pod for a remote-run job is ready * remote-run/token * get temporary credentials to exec into the remote-run pod * remote-run/stop * explicitly end the remote-run job (which will eventually be anyway since we set a deadline) I also refused to add more stuff to the `kubernetes_tools` module as that already has **4500+ lines in it**, which is too much for me not to get triggered about, so I grouped all the new methods needed in a new module under `paasta_tools.kubernetes`

Last portion of #3986, with the updated logic to reflect the updated API conventions introduced in #4022.

This was meant to be included in the last release but I (luisp) don't know how to read and merged things in the "wrong" order. This correctly merges in the following PRs: * cleanup for remote-run resources (#4024) Clean up logic for the (meant to be) ephemeral resources which are created in remote-run invocations (#4022). * new remote-run cli (#4025) Last portion of #3986, with the updated logic to reflect the updated API conventions introduced in #4022.

Qmando and others added 4 commits October 30, 2024 18:20

Checkpoint commit

97b973b

auth support for paasta APIs

3e049e1

allow passing auth token to API client

d99aaf1

More remote_run stuff

18a72fb

Qmando commented Nov 2, 2024

View reviewed changes

paasta_tools/api/api_docs/oapi.yaml Show resolved Hide resolved

Qmando commented Nov 2, 2024

View reviewed changes

paasta_tools/cli/cmds/remote_run_2.py Outdated Show resolved Hide resolved

piax93 reviewed Nov 4, 2024

View reviewed changes

paasta_tools/api/api.py Outdated Show resolved Hide resolved

paasta_tools/cli/cmds/remote_run_2.py Outdated Show resolved Hide resolved

paasta_tools/cli/cmds/remote_run_2.py Outdated Show resolved Hide resolved

piax93 reviewed Nov 4, 2024

View reviewed changes

change Deployment to Job

c6d4c20

Qmando commented Nov 6, 2024

View reviewed changes

piax93 reviewed Nov 6, 2024

View reviewed changes

Qmando added 5 commits November 6, 2024 16:24

Ensure service account

0b2e082

Service account, role binding, token

7474f8e

Merge remote-tracking branch 'origin/u/mpiano/SEC-19555' into remote_run

93410f9

Minor change to token generation

732d369

eks support

926b76d

piax93 reviewed Nov 11, 2024

View reviewed changes

Qmando added 4 commits November 11, 2024 18:34

pty.spawn, stop

a5cc802

is_eks

5707d6c

Use pod specific roles and service accounts

9a302a2

Removal of remote-run service accounts and roles

19f90ab

This was referenced Mar 12, 2025

remove vestigial remote-run implementation #4018

Merged

add support for k8s jobs #4019

Merged

remote-run api implementation #4022

Merged

piax93 closed this Mar 19, 2025

piax93 mentioned this pull request Mar 19, 2025

new remote-run cli #4025

Merged

nemacysts pushed a commit that referenced this pull request Mar 24, 2025

new remote-run cli (#4025)

8723ebe

Last portion of #3986, with the updated logic to reflect the updated API conventions introduced in #4022.

piax93 added a commit that referenced this pull request Mar 25, 2025

new remote-run cli (#4025)

25eda77

Last portion of #3986, with the updated logic to reflect the updated API conventions introduced in #4022.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

work in progress - remote run #3986

work in progress - remote run #3986

Qmando commented Nov 2, 2024 •

edited

Loading

Qmando Nov 2, 2024

Qmando Nov 2, 2024

piax93 Nov 4, 2024

Qmando Nov 6, 2024

piax93 Nov 6, 2024

piax93 Nov 6, 2024

piax93 Nov 6, 2024

piax93 Nov 6, 2024

piax93 Nov 6, 2024

piax93 Nov 6, 2024

piax93 Nov 11, 2024

piax93 commented Mar 19, 2025

		@@ -4354,6 +4452,19 @@ def get_kubernetes_secret_env_variables(
		return decrypted_secrets


		def create_temp_exec_token(kube_client: KubeClient, namespace: str, user: str):

work in progress - remote run #3986

work in progress - remote run #3986

Conversation

Qmando commented Nov 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piax93 commented Mar 19, 2025

Qmando commented Nov 2, 2024 •

edited

Loading