Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ActionPolicy does not work well with graphql-ruby dataloaders (not thread/fiber safe) #47

Closed
Bertg opened this issue Apr 19, 2023 · 9 comments
Labels
bug Something isn't working

Comments

@Bertg
Copy link

Bertg commented Apr 19, 2023

Tell us about your environment

Ruby Version: ruby 3.2.2

Framework Version (Rails, whatever): rails (7.0.4.3), graphql (2.0.21)

Action Policy Version: (0.6.5)

Action Policy GraphQL Version: (0.5.3)

What did you do?

We started using dataloader in our policies to help reduce the amount of queries.

We pass the graphql context as part of the policy context, which allows us to use our dataloaders in the database.

What did you expect to happen?

We expected everything to keep working as it did, but with less or equal amount of queries.

What actually happened?

We started getting completely wrong results when using expose_authorization_rules.

During our investigation we monkey patched the allowance_to method like this:

def allowance_to(rule, record = :__undef__, **options)
  policy = lookup_authorization_policy(record, **options)

  policy.apply(authorization_rule_for(policy, rule))
  policy.result.tap { |a| puts "policy_result: #{rule} #{a.inspect}" }
end

We got following (sample) output.

policy_result: publish? <PublicGraph::Policies::FolderNodePolicy#publish?: false>
policy_result: move_into? <PublicGraph::Policies::FolderNodePolicy#move_into?: false>
policy_result: create_item? <PublicGraph::Policies::FolderNodePolicy#move_into?: false>
policy_result: create_item_in? <PublicGraph::Policies::FolderNodePolicy#move_into?: false>
policy_result: create_folder_in? <PublicGraph::Policies::FolderNodePolicy#move_into?: false>

As you can see a discrepancy on the rule is exposed. The result that is returned is not that of the rule requested.

In order to understand what is happening it is important to know that resolving of these policies have (somewhere in the stack) a call to dataloader. Dataloader uses fibers to do its "magic", basically stopping execution of functions as long as it can.

This seems to be incompatible with the (default on) ActionPolicy::Behaviours::Memoized and ActionPolicy::Behaviours::ThreadMemoized. When calling the apply method (in allowance_to) the result is stored in the instance. When the thread is yielded (by the fiber scheduler) other threads can replace this result with an other call to apply.

As I see it this implementation can not be considered thread safe, especially in its default configuration.

@palkan
Copy link
Owner

palkan commented Apr 19, 2023

Thanks for the report!

Just to confirm that it relates to the ThreadMemoized, can you please try to turn it off and see if it helps:

ActionPolicy::PerThreadCache.enabled = false

(Though, I think, Memoized may cause the same problems, depending on whether the same authorization behaviour is reused by loaders)

@palkan palkan added the bug Something isn't working label Apr 19, 2023
@Bertg
Copy link
Author

Bertg commented Apr 20, 2023

@palkan We have, through monkey patching disabled ActionPolicy::Behaviours::Memoized and the issue resolved itself. In test mode the ThreadMemoised is not active anyway (in default setup) so it does not affect the outcome.

According to us the sequence of events is something like this:

The graph tries to return an authorisation rule `create_item' on object `FolderNode#foo`
  ActionPolicy `allowance_to` is invoked and caches the policy object with cache_key `FolderNode/foo`
    ActionPolicy invokes the `apply` method with `create_item` on the policy object
      Dataloader yields the execution back to the Fiber scheduler
The graph tries to return an authorisation rule `move_into' on object `FolderNode#foo`
  ActionPolicy is invoked and finds the policy object with cache_key `FolderNode/foo`
    ActionPolicy invokes the `apply` method with `move_into ` on the policy object
      Dataloader yields the execution back to the Fiber scheduler
The graph has no other fields to run, and yields back to dataloader
Dataloader loads its data
`allowance_to` for `create_item` can continue after the `apply` method, and returns the cached policy object
👆 here is the issue. The policy object result is now `move_into` as that was the last apply that was called
`allowance_to` for `move_into` can continue after the `apply` method, and returns the cached policy object

@palkan
Copy link
Owner

palkan commented May 18, 2023

Thanks for the details! That will help with reproduction in tests.

Also, I think it's worth extracting the minimal behaviour implementation into a separate module, so it's possible to opt-out from extensions (such as memoization) without monkey-patching.

@Bertg
Copy link
Author

Bertg commented Jun 12, 2023

Hi @palkan I was wondering if you had any idea on how to proceed with this. I saw you started a branch some time ago, but it hasn't moved much (publicly) since then.

I was looking into the code, and from where I'm sitting, it looks like some major overhaul of the internal implementation is required. I'd love to help where I could , but as it looks like a big change, some guidance on how you'd like to do this would be needed.

@npupko
Copy link

npupko commented Jun 27, 2023

@Bertg Could you add some context, please.
Is it causes problems with expose_authorization_rules only or any usage of dataloader inside the policy will cause this bug?
I am currently adding policies to the application and the approach I use for now is:

class RenewalPolicy < BasePolicy
  authorize :dataloader, optional: true

  def update?
    scopes = dataloader.with(Sources::ScopesForUser).load(user.id)
    global_scopes = scopes.include?("...")
    # CODE
  end
end

@Bertg
Copy link
Author

Bertg commented Jun 28, 2023

We have only discovered it in expose_authorization_rules. When actually checking the policies, the scheduler behaves in a sequential way, so the issue would not happen there. (Although, I guess it could in some other situations).

The issue here is how graphql-ruby resolves the values when returning the graph results, combined with the not parameterised memorisation that action_policy does.

To be clear: In the following bot of code, from action_policy the @result value is memorised, and not "scoped" to the rule argument. Subsequent methods relying on @result are thus not thread safe.

      # Returns a result of applying the specified rule (true of false).
      # Unlike simply calling a predicate rule (`policy.manage?`),
      # `apply` also calls pre-checks.
      def apply(rule)
        @result = self.class.result_class.new(self.class, rule)

        catch :policy_fulfilled do
          result.load __apply__(resolve_rule(rule))
        end

        result.value
      end

I mentioned before, that this is exactly what happens during the response generation phase in graphql-ruby. Several fibres call the apply rule on the same policy (and thus the @result value keeps changing). allowance_to then just returns the last @result instance set.

So, in short ActionPolicy:: Policy#apply has internal caching/memoisation/state that makes ActionPolicy::Behaviour#allowance_to (last 2 lines) not thread safe.

@palkan
Copy link
Owner

palkan commented Jul 25, 2023

@Bertg Thanks a lot for the details.

Sorry, hadn't had enough time to proceed with this issue: still don't have a reproduction setup :(

it looks like some major overhaul of the internal implementation is required

That's true. I plan to start working on v1.0 release soon, and this particular refactoring is on my list.

@bmulholland
Copy link

bmulholland commented Dec 18, 2024

Hi folks, hate to dig up an old thread here. I came across this while starting to switch from pundit to Action Policy and need to know the latest.

We use GraphQL and Dataloaders, and so I'm wondering what the current state of this is. In particular: since we use dataloaders, should we avoid using Action Policy at all? Are the issues with Action Policy itself (my understanding of the above) or only with the GraphQL integration (since this is an issue on the GraphQL gem)? Is there a way to configure Action Policy to work around the issues above -- does disabling the thread cache do enough? If so, what are the tradeoffs?

Is there any ongoing or partial, past work on this subject? What's left? Is there something we could do to help move along better support?

palkan added a commit to palkan/action_policy that referenced this issue Dec 18, 2024
Store the result object in the execution context instead of the ivar to avoid problems in fiber-concurrent execution environments

This is a temporary workaroudn with some preparation for 1.0; ideally, we should get rid of the local-global state completely

Ref palkan/action_policy-graphql#47
palkan added a commit to palkan/action_policy that referenced this issue Dec 18, 2024
Store the result object in the execution context instead of the ivar to avoid problems in fiber-concurrent execution environments

This is a temporary workaroudn with some preparation for 1.0; ideally, we should get rid of the local-global state completely

Ref palkan/action_policy-graphql#47
@palkan
Copy link
Owner

palkan commented Dec 18, 2024

Is there any ongoing or partial, past work on this subject? What's left? Is there something we could do to help move along better support?

Pushing me in the right moment is enough 😁 Looks like I finally got my mind around it.

The biggest blocker was the lack of a reproduction script. So, I wrote a synthetic one just to demonstrate (and fix) the problem: palkan/action_policy@29af4bc#diff-4ddbfab2d660b8eec0da3c00e45d1daaa23a08c83c914086ef3b148459d2ce43

The fix is available in action_policy 0.7.3.

@palkan palkan closed this as completed Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants