Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make canViewerDeleteAsync recursive #224

Merged
merged 4 commits into from
May 31, 2024
Merged

Conversation

wschurman
Copy link
Member

@wschurman wschurman commented May 28, 2024

Why

canViewerDeleteAsync historically just checked the privacy policy on the single entity, but didn't check the cascading deletion behavior of all entities that were being deleted. This could result in false-positives and was more an approximation of whether an entity could be deleted.

This PR fixes this bug by recursively traversing the deletion cascades in the same way that EntityMutator does and returning the conjunction of the can deletes.

How

In addition to updating the behavior, this moves the functions out of the entity this type for class inference and makes the invocation explicit. This is so the function can be used recursively on a unknown entity type (whereas calling functions with explicit this parameter type need to be called on a concrete type to typecheck correctly).

  1. Add tests
  2. Update utility method to support edge behavior.

Test Plan

Ensure tests pass.

Next steps

One pattern we're seeing more of is needing to process large amounts of data during deletions. I'm still figuring out how to codify this in the entity library, but likely we'll want something like the following:

In background job (not in web request since it processes too much data):

  1. Call canViewerDeleteAsync for object X which we want to delete asynchronously. W may need some sort of memory limiter like promise-limit.
  2. Manually delete objects that would be deleted in a cascade off of X. Do this in batches to avoid overloading DB/memory/etc.
  3. Then call delete on X, which no longer has as many cascades to process and therefore can be done in a single call.

@wschurman wschurman force-pushed the @wschurman/can-delete branch from 680e771 to 49ec9b7 Compare May 28, 2024 18:50
Copy link

codecov bot commented May 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (91d46ba) to head (8fb6b43).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main      #224      +/-   ##
===========================================
+ Coverage   99.89%   100.00%   +0.10%     
===========================================
  Files          68        69       +1     
  Lines        1831      1892      +61     
  Branches      244       265      +21     
===========================================
+ Hits         1829      1892      +63     
+ Misses          2         0       -2     
Flag Coverage Δ
integration 100.00% <100.00%> (+0.10%) ⬆️
unittest 100.00% <100.00%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wschurman wschurman force-pushed the @wschurman/can-delete branch 4 times, most recently from 6a13c6f to 93a1807 Compare May 28, 2024 20:14
@wschurman wschurman marked this pull request as ready for review May 28, 2024 20:18
@wschurman wschurman requested review from ide and quinlanj May 28, 2024 20:18
@wschurman wschurman force-pushed the @wschurman/can-delete branch from 93a1807 to 529741d Compare May 30, 2024 17:12
Copy link
Member

@ide ide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to block this PR but I think future use cases we need to be careful about are:

  • entities that have large numbers of references, like when many build entities reference the same account or the same user. Possible solutions are: chunking, and allowing some entities to bypass needing to be loaded to perform these checks. For instance, some entities like update-asset edges might have privacy policies that don't depend on both the viewer and the specific entity instance being deleted -- we don't need to load the actual update-asset edge entity to know its privacy policy outcome. Or phrased differently, what if privacy policy methods were split into InstancePrivacyPolicies and ClassPrivacyPolicies? And possibly ViewerAgnostic versions that don't get a viewer context? Can explain in person if this is confusing.

  • entities that reference multiple other entities - builds reference both accounts and users, which means that if we delete a user and their primary account, we might check 2x whether we can delete a build, one time when we delete the user and one time when we delete the account. Memoizing the can delete/can update results might be the solution, but comes with the messiness of introducing a cache.

packages/entity/src/utils/EntityPrivacyUtils.ts Outdated Show resolved Hide resolved

// Take entity X which is proposed to be deleted, look at inbound edges (entities that reference X).
// These inbound edges are the entities that will either get deleted
// or updated with null based on the edge definiton when entity X is deleted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// or updated with null based on the edge definiton when entity X is deleted.
// or updated with null foreign keys based on the edge definition when entity X is deleted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to roughly what is suggested, though foreign keys are a relational concept while entity is theoretically db-agnostic so I used a different phrasing. Let me know if you think it still needs alteration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I wanted to clarify is that the FK fields of the referencing entities will be set to null. Basically, for "Updated with null" to be more precise and say what exactly will be set to null.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated again. Thanks for the clarification.

packages/entity/src/utils/EntityPrivacyUtils.ts Outdated Show resolved Hide resolved
Copy link
Member Author

@wschurman wschurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: future use cases we need to be careful about

Yep, these are definitely considerations (mentioned them in the summary as well).

Agree that for certain classes of entities, it's sufficient to check permission on a single instance to know whether a group of entities loaded via an edge are authorized, but it's somewhat difficult to express that in the framework. For the update-asset edge example, the case is we just need to check if they all have the same owning app ID and then infer that they can be accessed. But this is a very application-specific constraint. Expressing it more generally would still require checking at least one instance and then also providing a uniformity function. Though this would still require loading all the entities and would only bypass extra privacy policy checks. Authorizing all nodes on the other end of an edge without loading the edges themselves is hard to express outside of application code. I assume this is the ClassPrivacyPolicies concept you were ideating.

For entities that reference multiple other entities, I think it's best to not memoize since
a) the policies are called with different arguments (different cascading delete reasons in particular)
b) the dataloader hopefully makes authorization-through-edges fairly fast

My plan is to implement useful utilities for doing these batch deletions here in the entity repo and then do the application-specific logic in the expo server code. I'm not too sure where the boundary between the two will be yet.


// Take entity X which is proposed to be deleted, look at inbound edges (entities that reference X).
// These inbound edges are the entities that will either get deleted
// or updated with null based on the edge definiton when entity X is deleted.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to roughly what is suggested, though foreign keys are a relational concept while entity is theoretically db-agnostic so I used a different phrasing. Let me know if you think it still needs alteration.

@wschurman wschurman requested a review from ide May 31, 2024 17:23
@ide
Copy link
Member

ide commented May 31, 2024

Loadless privacy policy evaluation: I was thinking about entities where the privacy policy doesn't depend on the entity itself ("only super users can edit this") or the doesn't depend on the viewer ("everyone can view this public object"). In the former case, we don't need to load the entity (hence a class method instead of instance method). And we'd be able to coalesce privacy policy checks (if we have 100 entities of the same type and the privacy policy is a class method, we can just evaluate the privacy policy once and know the answer for all 100 entities).

Memoization: That's a good point about cascading delete reasons being different.

@wschurman wschurman merged commit 60fc9a4 into main May 31, 2024
3 checks passed
@wschurman wschurman deleted the @wschurman/can-delete branch May 31, 2024 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants