Load credentials from file #4037

nurdann · 2025-04-01T20:26:27Z

Problem

We want to allow reading IAM credentials from a root protected file. So that there's a single point of entry otherwise many other roles can fetch EKS token.

Changes

Read root protected file
Re-run spark-run as sudo when given --get-eks-token-via-iam-user
Mount /etc/kubernetes directory - will make switching between configs easier
New paasta config key

Mounting `/etc/kubernetes`

Currently, only a single file is mounted

paasta/paasta_tools/utils.py

Lines 2736 to 2737 in 2e706a1

    
           def get_spark_kubeconfig(self) -> str: 
        
               return self.config_dict.get("spark_kubeconfig", "/etc/kubernetes/spark.conf")

Changing that requires changes to /etc/paasta/spark-run.json loaded from

paasta/paasta_tools/utils.py

Lines 93 to 94 in 2e706a1

    
           PATH_TO_SYSTEM_PAASTA_CONFIG_DIR = os.environ.get( 
        
               "PAASTA_SYSTEM_CONFIG_DIR", "/etc/paasta/"

populated via Puppet https://sourcegraph.yelpcorp.com/sysgit/puppet@4c6d259bc8e5cf7c48e75358a010741933591191/-/blob/modules/paasta_tools/manifests/public_config.pp?L501-508

Verification (Updated Apr 10)

can spark-submit using /etc/kubernetes/spark.conf - got expected PATH_NOT_FOUND since S3 object doesn't exist [cmd]
spark-submit with /etc/kubernetes/spark.conf fails to fetch credentials if EC2 metadata service is disabled as expected [cmd]
can fetch EKS token via loaded ENV vars - got PATH_NOT_FOUND as well [cmd]
Config changes to come from https://github.yelpcorp.com/sysgit/puppet/pull/14234

…ds-with-cli-arg # Conflicts: # tests/cli/test_cmds_spark_run.py

Qmando

Looks pretty good overall.

paasta_tools/cli/cmds/spark_run.py

chi-yelp · 2025-04-09T22:07:47Z

paasta_tools/cli/cmds/spark_run.py

+        spark_env["GET_EKS_TOKEN_AWS_SECRET_ACCESS_KEY"] = config["default"][
+            "aws_secret_access_key"
+        ]
+
    spark_env["KUBECONFIG"] = system_paasta_config.get_spark_kubeconfig()


Shouldn't we use spark2.conf instead of the default spark.conf for --get-eks-token-via-iam-user?

For now, you need to export KUBECONFIG=... to change it

That's actually good point! Otherwise it makes it annoying to tweak command line

…ds-with-cli-arg

paasta_tools/cli/cmds/spark_run.py

nemacysts · 2025-04-11T20:17:48Z

paasta_tools/cli/cmds/spark_run.py

+            "aws_secret_access_key"
+        ]
+
+        spark_env["KUBECONFIG"] = system_paasta_config.get_spark2_kubeconfig()


how long do we anticipate the migration to this taking? i don't really love the spark2 naming filewise, but it'd be nice to avoid naming functions with spark2 :p

Without separate config key, the cli-flag cannot switch to new kubeconfig unless new KUBECONFIG is exported. Once we no longer see file access to spark.conf, we can make the spark2 default flow

when that's done, will we swap the names back to spark.conf /remove all the spark2 naming?

(i mostly don't want the $FILENAME$N naming to stick around for long since it's not particularly descriptive)

We remove spark2 and make spark use the new format

and just for completeness: how long do we foresee this transition taking?

Until we can get representative sample using the new kubeconfig, I want to be optimistic but I could see it staying in here until end of year.

oh my, if this is potentially sticking around that long we probably want some better naming: spark2 (as a name) will be pretty meaningless to folks that aren't us several months into the future (unless they look through this PR/the tech spec/etc)

Renamed variable

paasta_tools/cli/cmds/spark_run.py

nemacysts · 2025-04-11T20:22:43Z

paasta_tools/cli/cmds/spark_run.py

+    if args.get_eks_token_via_iam_user and os.getuid() != 0:
+        print("Re-executing paasta spark-run with sudo..", file=sys.stderr)
+        # argv[0] is treated as command name, so prepending "sudo"
+        os.execvp("sudo", ["sudo"] + sys.argv)


do we need the -H that we do in paasta_local_run?

Home is inherited from running locally

$ sudo env | grep HOME HOME=/nail/home/me

(-H changes that behavior :))

It doesn't matter for reading root owned file though

Let me check how aws credentials are fetched

User can still delete the file if the directory is user owned. But If we set home to root, then we cannot use a user's profile.

So, the way docker command is constructed is different, no matter whether current uid is 0 or not, it will switch based on DEFAULT_SPARK_DOCKER_REGISTRY https://sourcegraph.yelpcorp.com/search?q=repo:%5EYelp/paasta%24+file:%5Epaasta_tools/cli/cmds/spark_run%5C.py%24+sudo&patternType=keyword&sm=0

hmm, we should probably try to make sure that we're not potentially leaving root-owned files around people's homedirs since that will otherwise generate onpoint load to get these cleaned up.

i'm assuming this is a problem because there's other parts of spark-run that will try to use a profile and create the file if not found?

Actually, I'm wrong. Then they would get error that profile doesn't exist

botocore.exceptions.ProfileNotFound: The config profile (devc) could not be found

FYI - spark-run will create a temporary pod template file under /nail/tmp, but I think it's fine to have a root-owned file under that path

paasta_tools/cli/cmds/spark_run.py

nemacysts · 2025-04-14T20:27:09Z

paasta_tools/cli/cmds/spark_run.py

+            "aws_secret_access_key"
+        ]
+
+        spark_env["KUBECONFIG"] = system_paasta_config.get_spark2_kubeconfig()


and just for completeness: how long do we foresee this transition taking?

paasta_tools/cli/cmds/spark_run.py

nemacysts · 2025-04-14T20:31:42Z

paasta_tools/cli/cmds/spark_run.py

+    if args.get_eks_token_via_iam_user and os.getuid() != 0:
+        print("Re-executing paasta spark-run with sudo..", file=sys.stderr)
+        # argv[0] is treated as command name, so prepending "sudo"
+        os.execvp("sudo", ["sudo"] + sys.argv)


hmm, we should probably try to make sure that we're not potentially leaving root-owned files around people's homedirs since that will otherwise generate onpoint load to get these cleaned up.

i'm assuming this is a problem because there's other parts of spark-run that will try to use a profile and create the file if not found?

tests/cli/test_cmds_spark_run.py

Co-authored-by: Luis Pérez <[email protected]>

tests/cli/test_cmds_spark_run.py

Load credentials from file

66afb95

nurdann requested review from nemacysts and Qmando April 1, 2025 20:26

nurdann added 8 commits April 2, 2025 17:14

Mount directory

2c05785

Remove else-statement

999ec12

Add missing arg

212a1ba

Expect directory mount

30f4da8

Mount directory (2)

7c439b2

Pass default argument

af6b6de

Add default argument (2)

793cab8

Add test case

e520dc4

nurdann marked this pull request as ready for review April 3, 2025 01:29

nurdann added 11 commits April 3, 2025 14:24

Revert mounting directory

faceec9

Revert mounting file

a802a17

Mount volume with spark.conf

4606f27

Remove mount

18270df

Mount directory of spark.conf

809fe00

Update tests; add test case

ded4cb3

Re-run spark-run as sudo

fa3d3d3

Add argument to tests

785601e

Add test for re-run as sudo

53ac8c2

Merge remote-tracking branch 'origin' into u/nalma/SEC-19835/load-cre…

526c9cc

…ds-with-cli-arg # Conflicts: # tests/cli/test_cmds_spark_run.py

Fix test

b69d67d

nurdann requested a review from jfongatyelp April 4, 2025 23:46

Qmando requested changes Apr 9, 2025

View reviewed changes

paasta_tools/cli/cmds/spark_run.py Outdated Show resolved Hide resolved

Update message

0446467

nurdann requested a review from Qmando April 9, 2025 00:58

chi-yelp reviewed Apr 9, 2025

View reviewed changes

chi-yelp previously approved these changes Apr 9, 2025

View reviewed changes

New config key

059ee3d

nurdann added 2 commits April 10, 2025 14:39

Update test case

5b863b3

Merge remote-tracking branch 'origin' into u/nalma/SEC-19835/load-cre…

caa0f3a

…ds-with-cli-arg

nurdann dismissed chi-yelp’s stale review via caa0f3a April 10, 2025 21:40

nurdann requested a review from a team as a code owner April 10, 2025 21:40

Fix mypy

1f3279b

nemacysts reviewed Apr 11, 2025

View reviewed changes

nurdann added 4 commits April 11, 2025 13:32

Add comment

606fec3

Add comment for ENV vars

7580967

Add comment for volume

383a7f7

Create variable

054bca3

nurdann requested a review from nemacysts April 14, 2025 17:39

nemacysts reviewed Apr 14, 2025

View reviewed changes

nurdann and others added 4 commits April 14, 2025 13:39

Update paasta_tools/cli/cmds/spark_run.py

47ce379

Co-authored-by: Luis Pérez <[email protected]>

Update paasta_tools/cli/cmds/spark_run.py

3057419

Co-authored-by: Luis Pérez <[email protected]>

Extract value as constant

83954c7

Rename mock variables

a559ea5

nurdann requested review from nemacysts and chi-yelp April 14, 2025 22:00

chi-yelp previously approved these changes Apr 15, 2025

View reviewed changes

Qmando previously approved these changes Apr 15, 2025

View reviewed changes

nemacysts reviewed Apr 16, 2025

View reviewed changes

tests/cli/test_cmds_spark_run.py Outdated Show resolved Hide resolved

nurdann added 2 commits April 16, 2025 10:43

Move variable to global scope

838b9d6

Rename paasta config

dcf17fe

nurdann dismissed stale reviews from Qmando and chi-yelp via dcf17fe April 16, 2025 17:49

nurdann requested a review from nemacysts April 16, 2025 17:49

nemacysts approved these changes Apr 17, 2025

View reviewed changes

chi-yelp approved these changes Apr 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load credentials from file #4037

Load credentials from file #4037

nurdann commented Apr 1, 2025 •

edited

Loading

Qmando left a comment

chi-yelp Apr 9, 2025

nurdann Apr 9, 2025

nurdann Apr 10, 2025

nemacysts Apr 11, 2025

nurdann Apr 11, 2025

nemacysts Apr 11, 2025

nemacysts Apr 11, 2025

nurdann Apr 11, 2025

nemacysts Apr 14, 2025

nurdann Apr 14, 2025

nemacysts Apr 16, 2025 •

edited

Loading

nurdann Apr 16, 2025

nemacysts Apr 11, 2025

nurdann Apr 11, 2025

nemacysts Apr 11, 2025

nurdann Apr 11, 2025

nurdann Apr 11, 2025

nurdann Apr 11, 2025 •

edited

Loading

nurdann Apr 11, 2025

nemacysts Apr 14, 2025

nurdann Apr 14, 2025

chi-yelp Apr 15, 2025

nemacysts Apr 14, 2025

nemacysts Apr 14, 2025

	def get_spark_kubeconfig(self) -> str:
	return self.config_dict.get("spark_kubeconfig", "/etc/kubernetes/spark.conf")

	PATH_TO_SYSTEM_PAASTA_CONFIG_DIR = os.environ.get(
	"PAASTA_SYSTEM_CONFIG_DIR", "/etc/paasta/"

Load credentials from file #4037

Are you sure you want to change the base?

Load credentials from file #4037

Conversation

nurdann commented Apr 1, 2025 • edited Loading

Problem

Changes

Mounting /etc/kubernetes

Verification (Updated Apr 10)

Qmando left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nemacysts Apr 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nurdann Apr 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nurdann commented Apr 1, 2025 •

edited

Loading

Mounting `/etc/kubernetes`

nemacysts Apr 16, 2025 •

edited

Loading

nurdann Apr 11, 2025 •

edited

Loading