-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more integrated support for Re:Dash users against the presto query_runner #3723
Closed
Closed
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
ca66668
- Changed the source passed from the presto pyhive client from declar…
Ralnoc b2b94d4
- Added password configuration for Presto query runner. This will def…
Ralnoc a062370
- Added missing element for the Presto config page for password.
Ralnoc 2c389d0
Merge branch 'master' into add_user_integration_with_presto
Ralnoc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were previous attempts as similar functionality, but it conflicts with how Redash caches data. Redash caches the last query execution result. This means it will cache the last execution of whatever user who executed it last, and hence might leak data to other users while giving the admin a false sense of protection.
I feel that we can't introduce this change as is without addressing the caching issue.
But I think you already use this in your environment, so maybe you can share how it works for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are definitely leveraging this implementation in our environment.
Authentication behavior
The behavior of this code in practice allows for the username configuration for the Presto data source to override the individual user logins. (In the case of a situation where Presto is configured with single user access.)
Aside from that, it will attempt to leverage the email address as the basis for the username to pass to Presto. I would much rather prefer to update the SAML code so that we can pass the username as a unique attribute which updates the username attribute of the user object which is passed to the Query runner. That would allow for a consistent method of finding the username which it supposed to be leveraged against Presto. However that was a much more involved change to get in place. This allows a consistent behavior (In most environments, email matches username.) and gets this initial process in place.
I do acknowledge that using a SAML configured Re:Dash cluster against a authentication enforced Presto endpoints does not work right now. There is no way to obtain the specific user's password. However this implementation does help when auditing and investigating the Presto cluster itself, as you know the specific user running the queries in question.
Cached Results Scenario
The cached results is a known scenario and your concerns are understandable. However they are concerns with a slightly inaccurate assumption.
Presto, in general, does not offer a robust authorization solution internally. There is the ability to do static controls through files on disk, and there is one vendor who has added a much more robust support for Apache Ranger (Starburst).
However it is understood and accepted that until Re:Dash supports the ability to control access to queries based on group membership, that any data queried through Re:Dash is visible to all users within Re:Dash. If there is a need to segment what previously queried data people have access to, for example secure financial data, we would stand up a separate Re:Dash cluster environment which only grants access through SAML for the Finance users who need access to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you use this mostly for auditing/monitoring purposes?
You can control access to data sources (and the queries resulting from them) using groups. It doesn't cover all the use cases groups to queries model would cover, but it's still easier than setting up a separate Redash cluster :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely! Control to the sources itself is supported internally, but not the cached results of the Query. It is simply recognized that the only way to prevent anyone with access to Re:Dash from viewing the cached results is by that larger scale separation.
The primary purpose of having these users getting passed through to Presto is for the purpose of controlling the user's ability to leverage the server resources within Presto through Presto Resource Groups. Previously any Presto Resource Group tunings apply to anyone on Re:Dash which made it difficult to tune resource utilizations within Presto for different users leveraging Re:Dash.
For users leveraging Starburst Presto, by having usernames getting passed through, we gain the ability to supporting Ranger for authorization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this sort of makes sense. I'm still worried that for people without the full context this will cause more harm than good. How about we put this behind an environment variable?
Similar to how we do with Athena:
redash/redash/query_runner/athena.py
Lines 9 to 11 in b09ae46
This way you can easily use this without a fork, while it's only enabled for people who fully understand the implication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is totally reasonable. Let me do a refactor on the code to implement this.