-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RHCLOUD-37152] Add locks on Roles, CAR and groups in migrator #1441
Conversation
rbac/migration_tool/migrate.py
Outdated
with transaction.atomic(): | ||
# Lock group | ||
Group.objects.select_for_update().get(pk=group.pk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may want to use the group state as of this query, rather than relying on the group state prior to this (which is what the view does). I'm wondering for example about if between the time the principals are queried, and the group is locked, principals are removed. That removal replicates first, and then the add happens which adds them back.
It's also overall maybe just easier to reason about.
To do that it is probably as simple as moving the with transaction.atomic()
to the top of and just inside the for
loop. It will potentially lock more groups, because it will end up locking groups that we don't have anything to replicate for, but that's probably fine.
e.g.
for group in groups:
with transaction.atomic():
principals: list[Principal] = []
system_roles: list[Role] = []
if not group.platform_default:
# ... etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually sorry that example is misleading I think. I was multitasking. I didn't include requerying the group with the lock which was the point.
for group in groups:
with transaction.atomic():
# Requery the group with a lock
group = Group.objects.select_for_update().get(pk=group.pk)
principals: list[Principal] = []
system_roles: list[Role] = []
if not group.platform_default:
# ... etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(you could optimize this a little bit by getting just the group pk's in the prior tenant.group_set
query)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think we may need to modify the group queryset in the view to lock the group in any principal modification, not just adds.
The reason is because in this case, the migrator assumes an "add principals" operation, but that's not actually happening. So normally when add and remove interleave, it should be fine, because they both also are modifying the database. So they may interleave, but regardless the outbox is always consistent with the database, so if a principal is there (or not), the same will be reflected in relations.
In this case, the migrator is not actually modifying the group-principal data, it's just producing an outbox message. If it also did group.principals.add(...)
then the outbox and relations would be consistent even if this interleaved with a removal, however that is probably inappropriate because it would "undo" the removal operation. (In the case of normal dual write, another user "undid" the operation so it's not as big of a deal if they interleave.)
The upside is making the lock more general makes the code a little easier to reason about I think. The downside is it can hurt performance, but I think that is not really a big concern here, since it is really unlikely that there is contention over these specific records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added those suggestions.
rbac/migration_tool/migrate.py
Outdated
role = Role.objects.select_for_update().get(pk=role.pk) | ||
dual_write_handler = RelationApiDualWriteHandler( | ||
role, ReplicationEventType.MIGRATE_CUSTOM_ROLE, replicator | ||
) | ||
) | ||
|
||
dual_write_handler.replicate_new_or_updated_role(role) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I think we might need two adjustments:
- We need
dual_write_handler.prepare_for_update()
before the replicate method - We might have a similar problem to above group principal removal with role deletion here. Normally, dual write shouldn't need to lock the role on deletion, because foreign key constraints on modifications to the role will prevent inconsistency. But here is another case where the migrator is not adding the data to the DB at the same time. So I think we could fix this one by either re-running
.save()
on a role serializer here, so that access FKs serialize concurrent removal, OR we have to lock the role in the role view queryset on role deletion also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I added
prepare_for_update
(I though that removal of current relations is not needed for role ) - I think lock on role in delete action is already here
eb580f7
to
265d5aa
Compare
3f2f9b7
to
58fe7eb
Compare
/retest |
185a72b
to
8e4ca33
Compare
rbac/migration_tool/migrate.py
Outdated
# if we don't write relationships (testing out the migration and clean up the created bindingmappings) | ||
if not settings.READ_ONLY_API_MODE and write_relationships != "False": | ||
# Run this if we don't write relationships (testing out the migration and clean up the created bindingmappings) | ||
if write_relationships != "False": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we get rid of the settings.READ_ONLY_API_MODE condition, we can never set write_relationships to True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, good point, I think we don't need this condition. I am removing it .
rbac/migration_tool/migrate.py
Outdated
|
||
|
||
def migrate_roles_for_tenant(tenant, exclude_apps, replicator): | ||
"""Migrate all roles for a given tenant.""" | ||
default_workspace = Workspace.objects.get(type=Workspace.Types.DEFAULT, tenant=tenant) | ||
|
||
roles = tenant.role_set.all() | ||
if exclude_apps: | ||
roles = roles.exclude(access__permission__application__in=exclude_apps) | ||
|
||
for role in roles: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want I guess we could also just get the Role PKs here but not a big deal.
with transaction.atomic(): | ||
# Lock cross account request | ||
cross_account_request = CrossAccountRequest.objects.select_for_update().get(pk=cross_account_request.pk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good I think. I can't think of a situation where this has a problem. When any CAR attribute changes (role, status, etc) it is locked, so those changes can't happen concurrently. The CAR used for replication is always the one queried with the lock, as is done here, so by the time a select returns, it will always be the latest state. So if dual write and migrator interleave, I don't think there should be a problem.
Looks good!
8e4ca33
to
b5eee92
Compare
b5eee92
to
1bcb905
Compare
I am going to merge as I added last suggestions and I got LGTM. |
Link(s) to Jira
Description of Intent of Change(s)
This PR add locks on Roles, CAR and groups in migrator and it will solve concurrency control during migration and dual writes.
Local Testing
..
Checklist
Secure Coding Practices Checklist Link
Secure Coding Practices Checklist