-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[A2-798] split partial result update calls for v2 and v2.1 #467
[A2-798] split partial result update calls for v2 and v2.1 #467
Conversation
24cec04
to
25e452c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some reviewer notes inside
@@ -704,16 +705,12 @@ func (s *policyServer) EngineUpdateInterceptor() grpc.UnaryServerInterceptor { | |||
// do nothing | |||
} | |||
|
|||
s.log.Debugf("Initiating store update for %s", info.FullMethod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 drive-by bug fix
@@ -119,6 +116,10 @@ func NewPoliciesServer( | |||
} | |||
srv.setVersion(v) | |||
|
|||
if err := srv.updateEngineStore(ctx); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ this needs to happen after setVersion
, so the proper store will be updated in the process (v2 vs v2.1). (v1 is done per-call in its corresponding handlers)
} | ||
refresher.refreshRequests <- m | ||
return m.Err() | ||
} | ||
|
||
// TODO: handle version: v2 vs v2.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fix that when we get there? 😉
2bbafaf
to
f0b1792
Compare
f334564
to
dcf74c0
Compare
@@ -185,10 +185,10 @@ func (refresher *policyRefresher) updateEngineStore(ctx context.Context, vsn api | |||
} | |||
|
|||
switch { | |||
case vsn.Minor == api.Version_V1: // v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E_IMANIDIOT
aa61614
to
d46aaed
Compare
return refresher.engine.V2SetPolicies(ctx, policyMap, roleMap, ruleMap) | ||
} | ||
// Note 2019/06/04 (sr): v1?! Yes, IAM v1. Our POC code depends on this query to be | ||
// answered regardless of whether IAM is v1, v2 or v2.1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helpful comment!
@@ -295,3 +313,7 @@ func (refresher *policyRefresher) getRuleMap(_ context.Context) (map[string][]in | |||
} | |||
return data, nil | |||
} | |||
|
|||
func pretty(vsn api.Version) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and the world is just a little bit better now.
if err != nil { | ||
return err | ||
} | ||
// Need a compiler for use by regular queries, too, so store it here. | ||
s.compiler = compiler | ||
|
||
// Partial eval for authzV2Query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest delete this comment, as well as L194-195; otherwise, make them more parallel to each other (on the same relative line, etc.).
return dumpData(ctx, s.store, s.log) | ||
} | ||
|
||
func dumpData(ctx context.Context, store storage.Store, l logger.Logger) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest move dumpData
below DumpDataV2p1
so DumpData*
are all contiguous.
@@ -235,17 +223,28 @@ func (s *State) initCompiler() (*ast.Compiler, error) { | |||
// DumpData is a bit fast-and-loose when it comes to error checking; it's not meant | |||
// to be used in production |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a further comment mentioning where you might inject any of the DumpData*
calls?
@@ -636,6 +636,20 @@ func (s *State) V2SetPolicies( | |||
return s.initPartialResultV2(ctx) | |||
} | |||
|
|||
// V2p1SetPolicies replaces OPA's data with a new set of policies and roles, | |||
// and resets the partial evaluation cache for v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/v2/v2p1/
s.v2Store = store | ||
s, err := New(ctx, l) | ||
require.NoError(b, err, "init state") | ||
s.v2Store = store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move the store outside the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention was to only benchmark the relevant bits, not what happens in s.New
. Setting the s.v2Store
is not relevant, either, but also likely not a thing that influences measurements too much.
if err := srv.updateEngineStore(ctx); err != nil { | ||
return nil, errors.Wrapf(err, "initialize engine storage (%v)", v) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I infer it is better to updateEngineStore
after ApplyV2DataMigrations
...? If so, consider adding a brief in-code comment mentioning why.
curPolicyID, err := refresher.store.GetPolicyChangeID(ctx) | ||
if err != nil { | ||
refresher.log.WithError(err).Warn("Failed to get current policy change ID") | ||
return lastPolicyID, err | ||
} | ||
if curPolicyID != lastPolicyID { | ||
if curPolicyID != lastPolicyID || forceUpdate { | ||
refresher.log.WithFields(logrus.Fields{ | ||
"lastPolicyID": lastPolicyID, | ||
"curPolicyID": curPolicyID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not from your PR, but please rename curPolicyID and lastPolicyID to curPolicyChangeID and lastPolicyChangeID...
( I saw this code and was thinking "What the heck is a 'current policy' and why do we care?" 🤔 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll touch this code soon to fix the multi-node issue mentioned in the comments. I'll probably do this, then.
return setupV2(t, nil, writer, nil, nil) | ||
writer engine.V2pXWriter) testSetup { | ||
return setupV2WithMigrationState(t, nil, writer, nil, make(chan api_v2.Version, 1), | ||
func(s storage.MigrationStatusProvider) error { return s.Success(context.Background()) }) // IAM v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the comment on the line communicates anything additional here...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does. I'll elaborate.
_ context.Context, policies map[string]interface{}, | ||
ctx context.Context, policies map[string]interface{}, | ||
roles map[string]interface{}, rules map[string][]interface{}) error { | ||
return te.V2p1SetPolicies(ctx, policies, roles, rules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add a comment explaining why it is OK to use the same test code for v2 and v2p1 here.
@@ -33,41 +33,28 @@ hab_curl() { | |||
do_test_deploy() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
So this most likely has issues with switching between v2 and v2.1 in a multi-node setting. I'm not sure how this is currently tested, so thinking hard is the only method we have right now for figuring this out. I'm torn between adding a card outlining the problematic scenario; and adding some commits on top of this PR. I'd rather not do the latter, but since I have no idea who depends on this functionality when, I'm probably opting for that anyways. |
Also: simplify module override logic a bit. Signed-off-by: Stephan Renatus <[email protected]>
Signed-off-by: Stephan Renatus <[email protected]>
Signed-off-by: Stephan Renatus <[email protected]>
This should allows us to shave a bit off our policy creation/update times, by only running what we should need. I'm not completely certain that this captures all issues potentially arising in the multi-node setup, but it should be fine for all-in-one. Signed-off-by: Stephan Renatus <[email protected]>
Signed-off-by: Stephan Renatus <[email protected]>
Signed-off-by: Stephan Renatus <[email protected]>
Signed-off-by: Stephan Renatus <[email protected]>
…rmined Signed-off-by: Stephan Renatus <[email protected]>
The stores are empty -- any responses based on these will be wrong. So, don't bother initializing. Signed-off-by: Stephan Renatus <[email protected]>
...as when the version changes: Without this flag, we'd get into the following situation when flipping from v2 to v2.1 or vice-versa: 1. the server's state would flip (say, v2 -> v2.1) 2. the policy refresher would notice that nothing changed in the policy data 3. and hence would not update the engine store as a result, the v2.1 engine store would remain non-functional. This would lead to the `chef-automate iam upgrade-to-v2 --beta2.1` command failing to list teams, APIError: An API error occurred during execution: Failed to retrieve team "editors": Failed to retrieve admins team: rpc error: code = PermissionDenied desc = error authorizing action "iam:teams:list" on resource "iam:teams" for subjects ["tls:service:deployment-service:40adf15d875d3190de6c24d0862804cf0d656be656f60234a8714a44563d5518"]: rpc error: code = Internal desc = error in query evaluation: cannot evaluate empty query I'm not sure if this is the best way, or if the IAM version should be reflected in the last change ID... Signed-off-by: Stephan Renatus <[email protected]>
This is a robustness measure -- without this, we might be answering the wrong versioned queries when we really shouldn't. Only relevant for scenarios where we upgrade, though, as usually, only one of the three should be initialized on startup. Signed-off-by: Stephan Renatus <[email protected]>
Plain v2 knows nothing about rules, and I think that's how it should stay. Signed-off-by: Stephan Renatus <[email protected]>
Signed-off-by: Stephan Renatus <[email protected]>
Signed-off-by: Stephan Renatus <[email protected]>
yeah well if it's wrong it doesn't work. Signed-off-by: Stephan Renatus <[email protected]>
On v2 server init, the version wasn't picked up: it depended on both the version channel being non-nil, AND on the migration status NOT having transitioned to failure. In the v2 memstore implementation, that was possible. Signed-off-by: Stephan Renatus <[email protected]>
Signed-off-by: Stephan Renatus <[email protected]>
This reverts commit 6f92330. Turns out the project rules stuff depends on this to work with IAM v2.
This reverts commit 54ca7bc.
Signed-off-by: Stephan Renatus <[email protected]>
This reverts commit f32a476.
Signed-off-by: Stephan Renatus <[email protected]>
Signed-off-by: Stephan Renatus <[email protected]>
Signed-off-by: Stephan Renatus <[email protected]>
1cecfc9
to
36c4481
Compare
Signed-off-by: Stephan Renatus <[email protected]>
36c4481
to
069a19d
Compare
Signed-off-by: Stephan Renatus <[email protected]>
🔩 Description
authz-service
uses OPA's partial eval feature to save time when authorizing requests by "pre-calculating" everything it can figure out without knowing the (variable) input data. However, that process itself takes time, any change to the relevant data (policies, roles, projects) requires a rebuild of that entire state.When introducing IAM v2.1, we've taken a bit of a shortcut, and have rebuilt both the v2 and the v2.1 queries' partial eval state. This PR is about fixing that.
The need to fix it arises from the rebuild time, which is dependent on the data it's built with, increasing when the data increases (linearly? haven't checked) -- and it's become noticable, and annoying.
ℹ️ Notes
integration/tests/upgrade_reset_iam_v2.sh
ingest-service
, it seems to query the v2.1-related projects-rules stuff on ingest, regardless of whether A2 is using IAM v1, v2 or v2.1.👍 Definition of Done
When using IAM v2.1, only the v2.1 partial eval result is rebuilt on updated data, and when using v2, only the v2 state is.
👟 Demo Script / Repro Steps
rebuild components/authz-service
chef-automate iam upgrade-to-v2 --beta2.1
)curl -kH "api-token: $TOK" https://localhost/apis/iam/v2beta/projects -d "$(jq -n '{ id: "foo", name: "my foo project" }')"
andcurl -kH "api-token: $TOK" https://localhost/apis/iam/v2beta/projects -d "$(jq -n '{ id: "bar", name: "my bar project" }')"
this is with
pol.json
containing⛓️ Related Resources
✅ Checklist