-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support to run HiveSQLEngine on kerberized YARN #6199
Support to run HiveSQLEngine on kerberized YARN #6199
Conversation
48241bb
to
cb06025
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6199 +/- ##
============================================
- Coverage 58.58% 58.38% -0.20%
Complexity 24 24
============================================
Files 649 649
Lines 39379 39468 +89
Branches 5415 5425 +10
============================================
- Hits 23070 23044 -26
- Misses 13841 13949 +108
- Partials 2468 2475 +7 ☔ View full report in Codecov by Sentry. |
yarnConf = KyuubiHadoopUtils.newYarnConfiguration(kyuubiConf) | ||
hadoopConf = KyuubiHadoopUtils.newHadoopConf(kyuubiConf) | ||
appUser = kyuubiConf.getOption(KYUUBI_SESSION_USER_KEY).orNull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this correct for GROUP and SERVER share level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KYUUBI_SESSION_USER_KEY
is set to EngineRef#appUser
when submit Engines. So it's correct for all share levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird. I remember we use this key to distinguish the session user and app user in some cases ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is confusing because we currently use KYUUBI_SESSION_USER_KEY
in both ProcBuilder
and Engine side Operation
.
Replacing KYUUBI_SESSION_USER_KEY
with another config when used in ProcBuilder
can eliminate confusion.
Let's do it in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
|
||
// Transfer the original user's tokens to the new user, since it may contain needed tokens | ||
// (such as those user to connect to YARN). | ||
// TODO originalCreds may have been expired if Application retries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we have disabled the YARN app retry in the Spark engine explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found we also had disabled retry in EngineYarnModeSubmitter
. This comment can be removed.
|
||
private def obtainHadoopFsDelegationToken(): Credentials = { | ||
val tokenRenewer = Master.getMasterPrincipal(hadoopConf) | ||
info("Delegation token renewer is: " + tokenRenewer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info("Delegation token renewer is: " + tokenRenewer) | |
info("Delegation token renewer is: $tokenRenewer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
If possible, would you mind leaving some notes about the invocation chain? It may be asked by future exploreres or contributors |
Agreed, many years ago we removed the |
// If we passed in a keytab, make sure we copy the keytab to the staging directory on | ||
// HDFS, and setup the relevant environment vars, so the AM can login again. | ||
amKeytabFileName.foreach { kt => | ||
info("To enable the AM to login from keytab, credentials are being copied over to the AM" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log seems a bit redundant, because in the Kerberos environment, we explicitly require users to configure this configuration item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keytabs are security-sensitive files. I think it's better to tell user why we upload keytabs in place.
newUGI | ||
case _ => | ||
val appUser = kyuubiConf.getOption(KyuubiReservedKeys.KYUUBI_SESSION_USER_KEY) | ||
require(appUser.isDefined) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be nice if add error message for the require.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
val ugi = if (UserGroupInformation.getCurrentUser.getShortUserName == appUser) { | ||
UserGroupInformation.getCurrentUser | ||
} else { | ||
UserGroupInformation.createProxyUser(appUser, UserGroupInformation.getCurrentUser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also supports proxy users, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only happen when using TGT cache with YARN mode, though it does not work ... it's good to keep those code path in case some one modifies the Hive source code to make it workable, but we'd better add a warning message to explicitly tell users this is not likely to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More precisely speaking, proxy user mechanism works fine during the submitting progress. AM can be started successfully.
But HiveSQLEngine currently only works with principal & keytab provided.
I think we should check this prerequisite in HiveYarnModeProcessBuilder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, exactly
if (UserGroupInformation.isSecurityEnabled) { | ||
val credentials = obtainHadoopFsDelegationToken() | ||
val serializedCreds = KyuubiHadoopUtils.serializeCredentials(credentials) | ||
amContainer.setTokens(ByteBuffer.wrap(serializedCreds)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the ApplicationMaster need to login even if the token is set on this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WithoutUserGroupInformation.loginUserFromKeytab
, the login user can only authenticate by tokens.
But HiveSQLEngine requires the login user to authenticate by kerberos tgt to obtain new tokens.
@@ -17,10 +17,12 @@ | |||
package org.apache.kyuubi.engine.deploy.yarn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The submitter process determine whether to login as well in kerberized environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what "determine whether to login as well" means. Can you elaborate it?
9d05fa7
to
5d3013a
Compare
kyuubi-server/src/main/scala/org/apache/kyuubi/engine/hive/HiveProcessBuilder.scala
Outdated
Show resolved
Hide resolved
…eProcessBuilder.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the failed tests should be easy to fix
OK, all Hive tests passed, I'm going to merge this PR. @zhouyifan279 thanks for your excellent work, and welcome to becoming a Kyuubi committer. |
🔍 Description
Issue References 🔗
This pull request implement a feature - Run HiveSQLEngine on kerberized YARN
Describe Your Solution 🔧
Introduced two configs:
When do submit to a kerberized YARN, submitter uploads
kyuubi.engine.keytab
to application's staging dir.YARN NodeManager downloads keytab to AM's working directory. AM logins to Kerberos using the principal and keytab
Note
I've tried to run HiveSQLEngine with only DelegationTokens but failed.
Take SQL
SELECT * FROM a
as an example:Hive handles this simple TableScan SQL by reading directly from table's hdfs file.
When Hive invokes
FileInputFormat.getSplits
during reading,java.io.IOException: Delegation Token can be issued only with kerberos or web authentication
will be thrown.The simplified stacktrace from IDEA is as below:
Theoretically, it can be solved by add AM DelegationTokens into
org.apache.hadoop.hive.ql.exec.FetchOperator.job.credentials
.But actually, it is impossible without modifying Hive's source code.
Types of changes 🔖
Test Plan 🧪
Behavior Without This Pull Request ⚰️
HiveSQLEngine can not run on a kerberized YARN
Behavior With This Pull Request 🎉
HiveSQLEngine can run on a kerberized YARN
Related Unit Tests
Checklist 📝
Be nice. Be informative.