Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Presto User Impersonation #4807

Closed
wants to merge 2 commits into from

Conversation

Ralnoc
Copy link
Contributor

@Ralnoc Ralnoc commented Apr 16, 2020

What type of PR is this? (check all applicable)

  • Feature

Description

The newest version of PyHive now supports enabling User Impersonation for Presto. This allows you to configure an authentication user for the Presto Configuration, but still execute the actual queries in Presto as the user logged into Re:Dash. This is a critical component to be able to have user level authorization in Presto, while also communicating over the SSL interface which requires authentication.

In order to leverage this setting. you must have User Impersonation configured in Presto. This is the documentation for doing so: https://docs.starburstdata.com/latest/security/impersonation.html

Upgrading PyHive to 0.6.2 to gain access to the new User Impersonation code.

Related Tickets & Documents

https://docs.starburstdata.com/latest/security/impersonation.html
dropbox/PyHive#309

@Ralnoc
Copy link
Contributor Author

Ralnoc commented Apr 16, 2020

@susodapop -
While I was at it, I raised this PR to add User Impersonation support for Presto. This will let Presto authenticate with a service account, and as long as PrestoSQL is configured correctly, execute the query as the logged in user, while still authenticating as the service account. This is a big one for my company which is currently having to modify the query runner after the fact to support individual user authorization in Presto because we leverage SAML.

Comment on lines 101 to 105
enable_user_impersonation = self.configuration.get("user_impersonation")
principle_username = None
if enable_user_impersonation and user is not None:
principle_username = user.email.split('@')[0]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I misunderstand the use-case here. Would it not make sense to allow arbitrary user selection in the data source setup screen? Since DS setup is restricted to account admins.

Suggested change
enable_user_impersonation = self.configuration.get("user_impersonation")
principle_username = None
if enable_user_impersonation and user is not None:
principle_username = user.email.split('@')[0]
enable_user_impersonation = (
user.email.split("@")[0]
if user is not None and self.configuration.get("user_impersonation", False)
else None
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are a number of parts for this process.

The primary use case for this is the need to have the user executing the query be passed through to Presto, so that Authorization rules and auditing can be done against the person actually running the query.

Internally Presto separates Authentication from Authorization. It also implements the ability for User Impersonation so that a service account with the right access configured in Presto can authenticate then tell Presto to run the query as the principal user.

The internal behavior within Presto in this situation is that the query will be executed using the defined Resource groups for the principal user, and if they are using Starburst Presto leverage the Ranger Security Access control to determine if the principal user even has access to the resources they are attempting to query.

Because of these two parts, the general expectation is that the user who is logged in, is the person who should be executing the query. That is why it is a boolean setting to turn on User impersonation, and why we are extracting the username from the email address.

If the desire was to run as a single user, then there would be no need for User Impersonation, as you could just leverage the account configured in the configuration page for authentication.

I hope this helps clarify the use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. This is gold that we can add to the documentation. Thank you for explaining.

connection = presto.connect(
host=self.configuration.get("host", ""),
port=self.configuration.get("port", 8080),
protocol=self.configuration.get("protocol", "http"),
username=self.configuration.get("username", "redash"),
principle_username=principle_username,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed: shouldn't principle be named principal? The presto docs name it PRINCIPAL.

I understand you wrote the support for this into pyhive as well so it might be an easy fix in both places. Might be more trouble than it's worth since changing it now might break other implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea. This is an autocorrect error that unfortunately no one caught until after it was released in pyhive. (I only just noticed it while I was working on this PR and types in principal and it threw an error.)

Making the update in PyHive would be difficult now that it is released, though I can at least implement the correct variable name in the code here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am reaching out there to work on getting that fix in place.
dropbox/PyHive#326

Worse case, I'll implement that PR to support both the old and new argument. Lets hold on this PR until a decision is made in PyHive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So they have already merged the correction PR and it will be in the next PyHive release. I'll update this PR with the next release that has that code change in it and update the code to match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this code to match the changes in the future release of PyHive. This PR will definitely fail checks until that release is out. I will clean up the PR in the requirements_all_ds.txt once the new version of pyhive with the change is released.

…eve those changes should be in version 0.6.3 when it is released.
@edvinas-maciulis
Copy link

Is anyone looking at the failure on Datree Smart Policy?

@guidopetri
Copy link
Contributor

@Ralnoc , thanks for the PR! We've updated a lot of things now that we're Community-driven so - if you're still interested in getting this merged - would you mind rebasing off master to re-run the CI, as well as updating merge conflicts?

We're trying to clean up our PR todo list, so if you're not interested, that's fine - we'll close the PR in about a week if we don't hear back. If you're interested in reopening the PR afterwards, we would also very much welcome that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants