[A2-798] split partial result update calls for v2 and v2.1 #467

srenatus · 2019-05-31T17:36:53Z

🔩 Description

authz-service uses OPA's partial eval feature to save time when authorizing requests by "pre-calculating" everything it can figure out without knowing the (variable) input data. However, that process itself takes time, any change to the relevant data (policies, roles, projects) requires a rebuild of that entire state.

When introducing IAM v2.1, we've taken a bit of a shortcut, and have rebuilt both the v2 and the v2.1 queries' partial eval state. This PR is about fixing that.

The need to fix it arises from the rebuild time, which is dependent on the data it's built with, increasing when the data increases (linearly? haven't checked) -- and it's become noticable, and annoying.

ℹ️ Notes

This PR also expands the current upgrade-to-v2-and-reset-to-v1 tests by going some extra loops through v2.1 as well, see integration/tests/upgrade_reset_iam_v2.sh
If you look at the commits, you'll find that I had to revert some of the cleanups I thought necessary. The need here was ingest-service, it seems to query the v2.1-related projects-rules stuff on ingest, regardless of whether A2 is using IAM v1, v2 or v2.1.

👍 Definition of Done

When using IAM v2.1, only the v2.1 partial eval result is rebuilt on updated data, and when using v2, only the v2 state is.

👟 Demo Script / Repro Steps

rebuild components/authz-service
upgrade to IAM v2.1 (chef-automate iam upgrade-to-v2 --beta2.1)
create two projects and 30 policies, measuring the time it takes:
- projects foo and bar: curl -kH "api-token: $TOK" https://localhost/apis/iam/v2beta/projects -d "$(jq -n '{ id: "foo", name: "my foo project" }')" and curl -kH "api-token: $TOK" https://localhost/apis/iam/v2beta/projects -d "$(jq -n '{ id: "bar", name: "my bar project" }')"

[118][default:/src:0]# time for i in $(seq 1 30); do curl -kH "api-token: $TOK" https://localhost/apis/iam/v2beta/policies -d "$(sed "s/ID/$i/" pol.json)"; done
real    0m5.211s
user    0m0.252s
sys     0m0.060s

this is with pol.json containing

{
  "name": "testpolicy",
  "id": "testpolicy-id-ID",
  "members": [
    "team:local:viewers"
  ],
  "statements": [
    {
      "effect": "ALLOW",
      "actions": [],
      "role": "viewer",
      "projects": [
        "*"
      ]
    }
  ],
  "projects": [ "foo", "bar" ]
}

delete those policies,

[119][default:/src:2]# chef-automate dev psql chef_authz_service -- -c "delete from iam_policies where id ilike 'testpolicy-id-%'"
DELETE 30

downgrade to v2, redo timing the 30 inserts:

[121][default:/src:130]# chef-automate iam upgrade-to-v2

Upgrading to IAM v2
Migrating v1 policies...
Creating default teams Editors and Viewers...
Skipped: Editors team already exists
Skipped: Viewers team already exists

Migrating existing teams...

Success: Enabled IAM v2
[122][default:/src:0]# time for i in $(seq 1 30); do curl -kH "api-token: $TOK" https://localhost/apis/iam/v2beta/policies -d "$(sed "s/ID/$i/" _dump/pol.json)"; done
[output omitted]
real    8m41.688s
user    0m0.300s
sys     0m0.112s

⚠️ This is odd, isn't it? So, this doesn't solve the issue for IAM v2 customers; but it does promise us a bright and fast future in IAM v2.1. Now I don't really understand (yet) why v2.1 is so much faster to rebuild, but at least we don't drag v2 around 😅. Also, not updating v2.1 when updating v2 takes some of the work away; combined with not updating the engine store twice, this hopefully helps a little.

⛓️ Related Resources

A2-798 👈 this is about the perceivable delay 🐛 this hopes to improve.

✅ Checklist

Necessary tests added/updated?
Necessary docs added/updated?
Code actually executed?
Vetting performed (unit tests, lint, etc.)?

srenatus

Some reviewer notes inside

srenatus · 2019-06-03T14:12:07Z

components/authz-service/server/v2/policy.go

@@ -704,16 +705,12 @@ func (s *policyServer) EngineUpdateInterceptor() grpc.UnaryServerInterceptor {
 			// do nothing
 		}

-		s.log.Debugf("Initiating store update for %s", info.FullMethod)


🐛 drive-by bug fix

srenatus · 2019-06-03T14:12:11Z

components/authz-service/server/v2/policy.go

@@ -119,6 +116,10 @@ func NewPoliciesServer(
 	}
 	srv.setVersion(v)

+	if err := srv.updateEngineStore(ctx); err != nil {


ℹ️ this needs to happen after setVersion, so the proper store will be updated in the process (v2 vs v2.1). (v1 is done per-call in its corresponding handlers)

srenatus · 2019-06-03T14:15:05Z

components/authz-service/server/v2/policy_refresher.go

 	}
 	refresher.refreshRequests <- m
 	return m.Err()
 }

+// TODO: handle version: v2 vs v2.1


Let's fix that when we get there? 😉

srenatus · 2019-06-04T09:36:22Z

components/authz-service/server/v2/policy_refresher.go

@@ -185,10 +185,10 @@ func (refresher *policyRefresher) updateEngineStore(ctx context.Context, vsn api
 	}

 	switch {
-	case vsn.Minor == api.Version_V1: // v2


E_IMANIDIOT

tylercloke · 2019-06-04T17:36:04Z

components/authz-service/server/v2/policy_refresher.go

+		return refresher.engine.V2SetPolicies(ctx, policyMap, roleMap, ruleMap)
+	}
+	// Note 2019/06/04 (sr): v1?! Yes, IAM v1. Our POC code depends on this query to be
+	// answered regardless of whether IAM is v1, v2 or v2.1.


Helpful comment!

tylercloke · 2019-06-04T17:36:35Z

components/authz-service/server/v2/policy_refresher.go

@@ -295,3 +313,7 @@ func (refresher *policyRefresher) getRuleMap(_ context.Context) (map[string][]in
 	}
 	return data, nil
 }
+
+func pretty(vsn api.Version) string {


msorens

...and the world is just a little bit better now.

msorens · 2019-06-05T01:26:18Z

components/authz-service/engine/opa/opa.go

 	if err != nil {
 		return err
 	}
-	// Need a compiler for use by regular queries, too, so store it here.
-	s.compiler = compiler

 	// Partial eval for authzV2Query.


Suggest delete this comment, as well as L194-195; otherwise, make them more parallel to each other (on the same relative line, etc.).

msorens · 2019-06-05T01:38:40Z

components/authz-service/engine/opa/opa.go

+	return dumpData(ctx, s.store, s.log)
+}
+
+func dumpData(ctx context.Context, store storage.Store, l logger.Logger) error {


Suggest move dumpData below DumpDataV2p1 so DumpData* are all contiguous.

msorens · 2019-06-05T01:40:06Z

components/authz-service/engine/opa/opa.go

@@ -235,17 +223,28 @@ func (s *State) initCompiler() (*ast.Compiler, error) {
 // DumpData is a bit fast-and-loose when it comes to error checking; it's not meant
 // to be used in production


Could you add a further comment mentioning where you might inject any of the DumpData* calls?

msorens · 2019-06-05T01:41:24Z

components/authz-service/engine/opa/opa.go

@@ -636,6 +636,20 @@ func (s *State) V2SetPolicies(
 	return s.initPartialResultV2(ctx)
 }

+// V2p1SetPolicies replaces OPA's data with a new set of policies and roles,
+// and resets the partial evaluation cache for v2


msorens · 2019-06-05T01:42:15Z

components/authz-service/engine/opa/opa_internal_test.go

-				s.v2Store = store
+			s, err := New(ctx, l)
+			require.NoError(b, err, "init state")
+			s.v2Store = store


Why move the store outside the loop?

The intention was to only benchmark the relevant bits, not what happens in s.New. Setting the s.v2Store is not relevant, either, but also likely not a thing that influences measurements too much.

msorens · 2019-06-05T01:44:15Z

components/authz-service/server/v2/policy.go

+	if err := srv.updateEngineStore(ctx); err != nil {
+		return nil, errors.Wrapf(err, "initialize engine storage (%v)", v)
+	}
+


I infer it is better to updateEngineStore after ApplyV2DataMigrations...? If so, consider adding a brief in-code comment mentioning why.

msorens · 2019-06-05T01:50:25Z

components/authz-service/server/v2/policy_refresher.go

 	curPolicyID, err := refresher.store.GetPolicyChangeID(ctx)
 	if err != nil {
 		refresher.log.WithError(err).Warn("Failed to get current policy change ID")
 		return lastPolicyID, err
 	}
-	if curPolicyID != lastPolicyID {
+	if curPolicyID != lastPolicyID || forceUpdate {
 		refresher.log.WithFields(logrus.Fields{
 			"lastPolicyID": lastPolicyID,
 			"curPolicyID":  curPolicyID,


Not from your PR, but please rename curPolicyID and lastPolicyID to curPolicyChangeID and lastPolicyChangeID...
( I saw this code and was thinking "What the heck is a 'current policy' and why do we care?" 🤔 )

I'll touch this code soon to fix the multi-node issue mentioned in the comments. I'll probably do this, then.

msorens · 2019-06-05T01:59:07Z

components/authz-service/server/v2/policy_test.go

-	return setupV2(t, nil, writer, nil, nil)
+	writer engine.V2pXWriter) testSetup {
+	return setupV2WithMigrationState(t, nil, writer, nil, make(chan api_v2.Version, 1),
+		func(s storage.MigrationStatusProvider) error { return s.Success(context.Background()) }) // IAM v2


Not sure the comment on the line communicates anything additional here...?

I think it does. I'll elaborate.

msorens · 2019-06-05T02:01:11Z

components/authz-service/server/v2/policy_test.go

-	_ context.Context, policies map[string]interface{},
+	ctx context.Context, policies map[string]interface{},
+	roles map[string]interface{}, rules map[string][]interface{}) error {
+	return te.V2p1SetPolicies(ctx, policies, roles, rules)


Perhaps add a comment explaining why it is OK to use the same test code for v2 and v2p1 here.

msorens · 2019-06-05T02:04:02Z

integration/tests/upgrade_reset_iam_v2.sh

@@ -33,41 +33,28 @@ hab_curl() {
 do_test_deploy() {


srenatus · 2019-06-05T08:46:50Z

So this most likely has issues with switching between v2 and v2.1 in a multi-node setting. I'm not sure how this is currently tested, so thinking hard is the only method we have right now for figuring this out. I'm torn between adding a card outlining the problematic scenario; and adding some commits on top of this PR. I'd rather not do the latter, but since I have no idea who depends on this functionality when, I'm probably opting for that anyways.

Also: simplify module override logic a bit. Signed-off-by: Stephan Renatus <[email protected]>

Signed-off-by: Stephan Renatus <[email protected]>

This should allows us to shave a bit off our policy creation/update times, by only running what we should need. I'm not completely certain that this captures all issues potentially arising in the multi-node setup, but it should be fine for all-in-one. Signed-off-by: Stephan Renatus <[email protected]>

Signed-off-by: Stephan Renatus <[email protected]>

…rmined Signed-off-by: Stephan Renatus <[email protected]>

The stores are empty -- any responses based on these will be wrong. So, don't bother initializing. Signed-off-by: Stephan Renatus <[email protected]>

...as when the version changes: Without this flag, we'd get into the following situation when flipping from v2 to v2.1 or vice-versa: 1. the server's state would flip (say, v2 -> v2.1) 2. the policy refresher would notice that nothing changed in the policy data 3. and hence would not update the engine store as a result, the v2.1 engine store would remain non-functional. This would lead to the `chef-automate iam upgrade-to-v2 --beta2.1` command failing to list teams, APIError: An API error occurred during execution: Failed to retrieve team "editors": Failed to retrieve admins team: rpc error: code = PermissionDenied desc = error authorizing action "iam:teams:list" on resource "iam:teams" for subjects ["tls:service:deployment-service:40adf15d875d3190de6c24d0862804cf0d656be656f60234a8714a44563d5518"]: rpc error: code = Internal desc = error in query evaluation: cannot evaluate empty query I'm not sure if this is the best way, or if the IAM version should be reflected in the last change ID... Signed-off-by: Stephan Renatus <[email protected]>

This is a robustness measure -- without this, we might be answering the wrong versioned queries when we really shouldn't. Only relevant for scenarios where we upgrade, though, as usually, only one of the three should be initialized on startup. Signed-off-by: Stephan Renatus <[email protected]>

Plain v2 knows nothing about rules, and I think that's how it should stay. Signed-off-by: Stephan Renatus <[email protected]>

Signed-off-by: Stephan Renatus <[email protected]>

yeah well if it's wrong it doesn't work. Signed-off-by: Stephan Renatus <[email protected]>

On v2 server init, the version wasn't picked up: it depended on both the version channel being non-nil, AND on the migration status NOT having transitioned to failure. In the v2 memstore implementation, that was possible. Signed-off-by: Stephan Renatus <[email protected]>

Signed-off-by: Stephan Renatus <[email protected]>

This reverts commit 6f92330. Turns out the project rules stuff depends on this to work with IAM v2.

This reverts commit 54ca7bc.

Signed-off-by: Stephan Renatus <[email protected]>

This reverts commit f32a476.

Signed-off-by: Stephan Renatus <[email protected]>

srenatus added automate-auth iamv2 This issue or pull request applies to iamv2 work for Automate tech debt This issue addresses tech debt in our code base labels May 31, 2019

srenatus self-assigned this May 31, 2019

srenatus force-pushed the sr/a2-798/split-upgrade-calls-for-v2-and-v2.1 branch 5 times, most recently from 24cec04 to 25e452c Compare June 3, 2019 14:01

srenatus commented Jun 3, 2019

View reviewed changes

srenatus force-pushed the sr/a2-798/split-upgrade-calls-for-v2-and-v2.1 branch 2 times, most recently from 2bbafaf to f0b1792 Compare June 3, 2019 14:56

srenatus mentioned this pull request Jun 3, 2019

[A2-808] OPA update for project rules #273

Merged

4 tasks

srenatus added the WIP label Jun 3, 2019

srenatus force-pushed the sr/a2-798/split-upgrade-calls-for-v2-and-v2.1 branch 2 times, most recently from f334564 to dcf74c0 Compare June 3, 2019 18:12

srenatus commented Jun 4, 2019

View reviewed changes

srenatus force-pushed the sr/a2-798/split-upgrade-calls-for-v2-and-v2.1 branch 2 times, most recently from aa61614 to d46aaed Compare June 4, 2019 12:21

srenatus requested a review from a team as a code owner June 4, 2019 13:34

srenatus removed the WIP label Jun 4, 2019

tylercloke approved these changes Jun 4, 2019

View reviewed changes

msorens approved these changes Jun 5, 2019

View reviewed changes

srenatus added 6 commits June 5, 2019 10:59

opa: don't share a compiler

060f3d1

Also: simplify module override logic a bit. Signed-off-by: Stephan Renatus <[email protected]>

opa: don't include opa.New() in benchmark

4824bdd

Signed-off-by: Stephan Renatus <[email protected]>

opa: dedup DumpData{,V2}

659bfc6

Signed-off-by: Stephan Renatus <[email protected]>

debug

79fd4d4

Signed-off-by: Stephan Renatus <[email protected]>

conformance_tests: fix v2 vs v2.1 issues

279447b

Signed-off-by: Stephan Renatus <[email protected]>

srenatus added 18 commits June 5, 2019 10:59

opa: anything projects related is v2.1

17043f8

Signed-off-by: Stephan Renatus <[email protected]>

server/migration: update engine store AFTER version to update is dete…

ed1e53d

…rmined Signed-off-by: Stephan Renatus <[email protected]>

engine/opa: remove partial result init on startup

9224390

The stores are empty -- any responses based on these will be wrong. So, don't bother initializing. Signed-off-by: Stephan Renatus <[email protected]>

engine/opa: cleanup V2SetPolicies vs V2p1SetPolicies

ef5736d

Plain v2 knows nothing about rules, and I think that's how it should stay. Signed-off-by: Stephan Renatus <[email protected]>

authz-service/policy: move engine update below data migrations

e6efef3

Signed-off-by: Stephan Renatus <[email protected]>

policy_refresher: skip IAM v1 requests

1b81f53

Signed-off-by: Stephan Renatus <[email protected]>

policy_refresher: fix v2 vs v2.1 switch

30cd18f

yeah well if it's wrong it doesn't work. Signed-off-by: Stephan Renatus <[email protected]>

policy_test: fix migration status change impact

81006b5

Signed-off-by: Stephan Renatus <[email protected]>

Revert "engine/opa: cleanup V2SetPolicies vs V2p1SetPolicies"

4e306a6

This reverts commit 6f92330. Turns out the project rules stuff depends on this to work with IAM v2.

Revert "opa: anything projects related is v2.1"

60aa1ed

This reverts commit 54ca7bc.

policy_refresher: un-skip IAM v1 requests

b92bd51

Signed-off-by: Stephan Renatus <[email protected]>

Revert "engine/opa: reset x and y if z has just been updated"

5cf1b37

This reverts commit f32a476.

opa: remove debug data dump

bfc7d6c

Signed-off-by: Stephan Renatus <[email protected]>

[integration]: update_reset_iam_v2: include v2.1, dedup script

b0c9288

Signed-off-by: Stephan Renatus <[email protected]>

ci: use --skip-policy-migration flag, not studio helper

29e9509

Signed-off-by: Stephan Renatus <[email protected]>

srenatus force-pushed the sr/a2-798/split-upgrade-calls-for-v2-and-v2.1 branch from 1cecfc9 to 36c4481 Compare June 5, 2019 08:59

add code comments per review comments

069a19d

Signed-off-by: Stephan Renatus <[email protected]>

srenatus force-pushed the sr/a2-798/split-upgrade-calls-for-v2-and-v2.1 branch from 36c4481 to 069a19d Compare June 5, 2019 09:06

authz-service: add note re: v2->v2.1 in multi-node

f06496e

Signed-off-by: Stephan Renatus <[email protected]>

srenatus merged commit 8df4b34 into master Jun 5, 2019

chef-ci deleted the sr/a2-798/split-upgrade-calls-for-v2-and-v2.1 branch June 5, 2019 10:16

srenatus mentioned this pull request Jun 5, 2019

[A2-798] fix v2.1 upgrade issue for multi-node settings #501

Merged

4 tasks

susanev added the auth-team anything that needs to be on the auth team board label Jul 20, 2019

snyk-bot mentioned this pull request Nov 5, 2022

[Snyk] Fix for 1 vulnerabilities ekmixon/automate#77

Open

ekmixon mentioned this pull request Dec 26, 2022

[Snyk] Fix for 1 vulnerabilities ekmixon/automate#100

Open

ekmixon mentioned this pull request Mar 23, 2024

[Snyk] Fix for 2 vulnerabilities ekmixon/automate#164

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[A2-798] split partial result update calls for v2 and v2.1 #467

[A2-798] split partial result update calls for v2 and v2.1 #467

srenatus commented May 31, 2019 •

edited

Loading

srenatus left a comment

srenatus Jun 3, 2019

srenatus Jun 3, 2019

srenatus Jun 3, 2019

srenatus Jun 4, 2019

tylercloke Jun 4, 2019

tylercloke Jun 4, 2019

msorens left a comment

msorens Jun 5, 2019

msorens Jun 5, 2019

msorens Jun 5, 2019

msorens Jun 5, 2019

msorens Jun 5, 2019

srenatus Jun 5, 2019

msorens Jun 5, 2019

msorens Jun 5, 2019

srenatus Jun 5, 2019

msorens Jun 5, 2019

srenatus Jun 5, 2019

msorens Jun 5, 2019

msorens Jun 5, 2019

srenatus commented Jun 5, 2019

		@@ -235,17 +223,28 @@ func (s State) initCompiler() (ast.Compiler, error) {
		// DumpData is a bit fast-and-loose when it comes to error checking; it's not meant
		// to be used in production

[A2-798] split partial result update calls for v2 and v2.1 #467

[A2-798] split partial result update calls for v2 and v2.1 #467

Conversation

srenatus commented May 31, 2019 • edited Loading

🔩 Description

ℹ️ Notes

👍 Definition of Done

👟 Demo Script / Repro Steps

⛓️ Related Resources

✅ Checklist

srenatus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msorens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srenatus commented Jun 5, 2019

srenatus commented May 31, 2019 •

edited

Loading