-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Overriding Object Store Credential Provider #18979
Comments
FWIW the usecase we are hearing in
(you can see from the linked tickets this often comes down to us in object_store as a request to implement the various access control methods directly) We are hoping that exposing access to the general purpose mechanism in polars would allow users to access their data using polars directly |
This sounds like a way to enable a lot of users, which is great. I don't really know how this would work (as I don't know enough about this topic), so I need some help in understanding what is requested from us. How would we enable this? I see that there is a trait can objet-store be instantiated with that? Or can it be passed as a dynamic argument? What would this look like on the Python side? |
The various builders store builders allow providing a custom credential provider at construction time. I don't know enough about polars to know precisely what this might look like when hooked up, especially via python, changes may be needed on the object_store side to facilitate this, but I wanted to start the discussion. I suspect it will be necessary to use https://docs.rs/object_store/latest/object_store/enum.ObjectStoreScheme.html directly as opposed to the type-erased parse_url method TBC I don't have capacity to implement this, but happy to assist |
What I personally suggest is add a way in polars for users to call out to a separate to retrieve credentials when needed Here is how this works with aws, though we don't yet support this via object_store ( tracked by apache/arrow-rs#6422) There are similar mechanisms for azure and gcp, for example: https://docs.rs/object_store/latest/object_store/azure/struct.MicrosoftAzureBuilder.html#method.with_use_azure_cli So from polars this could look like
|
FWIW I view calling out to a separate process as strictly less flexible than what I propose here, limited to AWS, and tbh a bit of a hack. Providing a way for users to provide this within the Polars process would be cleaner, could work out of the box (e.g. using the cloud provider's SDK if available), and be more secure. Tbh if we can make traction here I'd be tempted to not do apache/arrow-rs#6422 and instead fix the issue properly |
I don't know what you mean by 'fix it properly" -- do you mean somehow have an API in polars that provides credentials via arbitrary python code provided to some polars API? |
Precisely, this would not only solve this for AWS but also any of the other stores we support. We expose this API for a reason 😄 |
Alright, I still didn't have time to research yet, but just know that we are willing to help and implement here. I will come back once I have more knowledge and sensible input. ;) |
FWIW I've also filed a simpler proposal in #19022 that might be more immediately actionable if you can tolerate its compromises. |
@ritchie46 you might be able to take inspiration from how it's done in delta-rs: https://github.com/delta-io/delta-rs/blob/main/crates/aws/src/credentials.rs |
FYI: I asked @nameexhaustion to look into this. |
Should |
We shouldn't, I think they are being loaded from the environment, I will make a PR to fix |
Thank you so much @tustvold @alamb @nameexhaustion and @ritchie46 . This is huge |
Can confirm that this now works on 1.13.0. Great stuff! |
Description
Problem
object_store provides a mechanism to provide a custom way to source credentials by providing a custom CredentialProvider. This is an important capability for supporting authentication schemes we don't natively support, such as AWS_PROFILE (#18757) and SSO, etc... The object_store crate aims to support most common authentication mechanisms, but is not aiming to be a full re-implementation of all the authentication functionality of the various cloud providers.
Proposal
I would like a way to override the credential provider used by object stores, in particular to allow:
Alternatives Considered
Users could use software like aws-vault to generate session credentials, whilst this has other security benefits, for various reasons people may not wish to do this.
Related Context
AWS_PROFILE
should be supported in cloud storage I/O config #18757The text was updated successfully, but these errors were encountered: