-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Presto User Impersonation #4807
Conversation
… Presto. Upgrading PyHive to 0.6.2
@susodapop - |
redash/query_runner/presto.py
Outdated
enable_user_impersonation = self.configuration.get("user_impersonation") | ||
principle_username = None | ||
if enable_user_impersonation and user is not None: | ||
principle_username = user.email.split('@')[0] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I misunderstand the use-case here. Would it not make sense to allow arbitrary user selection in the data source setup screen? Since DS setup is restricted to account admins.
enable_user_impersonation = self.configuration.get("user_impersonation") | |
principle_username = None | |
if enable_user_impersonation and user is not None: | |
principle_username = user.email.split('@')[0] | |
enable_user_impersonation = ( | |
user.email.split("@")[0] | |
if user is not None and self.configuration.get("user_impersonation", False) | |
else None | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are a number of parts for this process.
The primary use case for this is the need to have the user executing the query be passed through to Presto, so that Authorization rules and auditing can be done against the person actually running the query.
Internally Presto separates Authentication from Authorization. It also implements the ability for User Impersonation so that a service account with the right access configured in Presto can authenticate then tell Presto to run the query as the principal user.
The internal behavior within Presto in this situation is that the query will be executed using the defined Resource groups for the principal user, and if they are using Starburst Presto leverage the Ranger Security Access control to determine if the principal user even has access to the resources they are attempting to query.
Because of these two parts, the general expectation is that the user who is logged in, is the person who should be executing the query. That is why it is a boolean setting to turn on User impersonation, and why we are extracting the username from the email address.
If the desire was to run as a single user, then there would be no need for User Impersonation, as you could just leverage the account configured in the configuration page for authentication.
I hope this helps clarify the use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. This is gold that we can add to the documentation. Thank you for explaining.
redash/query_runner/presto.py
Outdated
connection = presto.connect( | ||
host=self.configuration.get("host", ""), | ||
port=self.configuration.get("port", 8080), | ||
protocol=self.configuration.get("protocol", "http"), | ||
username=self.configuration.get("username", "redash"), | ||
principle_username=principle_username, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed: shouldn't principle
be named principal
? The presto docs name it PRINCIPAL
.
I understand you wrote the support for this into pyhive
as well so it might be an easy fix in both places. Might be more trouble than it's worth since changing it now might break other implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea. This is an autocorrect error that unfortunately no one caught until after it was released in pyhive. (I only just noticed it while I was working on this PR and types in principal
and it threw an error.)
Making the update in PyHive would be difficult now that it is released, though I can at least implement the correct variable name in the code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am reaching out there to work on getting that fix in place.
dropbox/PyHive#326
Worse case, I'll implement that PR to support both the old and new argument. Lets hold on this PR until a decision is made in PyHive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So they have already merged the correction PR and it will be in the next PyHive release. I'll update this PR with the next release that has that code change in it and update the code to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this code to match the changes in the future release of PyHive. This PR will definitely fail checks until that release is out. I will clean up the PR in the requirements_all_ds.txt
once the new version of pyhive with the change is released.
…eve those changes should be in version 0.6.3 when it is released.
Is anyone looking at the failure on Datree Smart Policy? |
@Ralnoc , thanks for the PR! We've updated a lot of things now that we're Community-driven so - if you're still interested in getting this merged - would you mind rebasing off master to re-run the CI, as well as updating merge conflicts? We're trying to clean up our PR todo list, so if you're not interested, that's fine - we'll close the PR in about a week if we don't hear back. If you're interested in reopening the PR afterwards, we would also very much welcome that. |
What type of PR is this? (check all applicable)
Description
The newest version of PyHive now supports enabling User Impersonation for Presto. This allows you to configure an authentication user for the Presto Configuration, but still execute the actual queries in Presto as the user logged into Re:Dash. This is a critical component to be able to have user level authorization in Presto, while also communicating over the SSL interface which requires authentication.
In order to leverage this setting. you must have User Impersonation configured in Presto. This is the documentation for doing so: https://docs.starburstdata.com/latest/security/impersonation.html
Upgrading PyHive to 0.6.2 to gain access to the new User Impersonation code.
Related Tickets & Documents
https://docs.starburstdata.com/latest/security/impersonation.html
dropbox/PyHive#309