-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33084][CORE][SQL] Add jar support ivy path #29966
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #129507 has finished for PR 29966 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #129520 has finished for PR 29966 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #129551 has finished for PR 29966 at commit
|
cc @dongjoon-hyun As I have mentioned in https://issues.apache.org/jira/browse/SPARK-29288, make this pr support ivy path like https://issues.apache.org/jira/browse/HIVE-9664 |
Got it. Thank you, @AngersZhuuuu |
Test build #131522 has finished for PR 29966 at commit
|
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test status success |
Test build #131524 has finished for PR 29966 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #133335 has finished for PR 29966 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM for Apache Spark 3.2.0. Thank you, @AngersZhuuuu and all.
Feel free to merge, @maropu .
BTW, Merry Christmas and Happy New Year, all! |
Thanks, @AngersZhuuuu @dongjoon-hyun ! Merged to master. Happy Merry Christmas, too! |
FYI: @gatorsmile @cloud-fan |
Merry Christmas! Thanks all for your patient reviews. |
…() 's return parameter ### What changes were proposed in this pull request? Per discuss in #29966 (comment) We'd better change `SparkSubmitUtils.resolveMavenCoordinates()` 's return value as `Seq[String]` ### Why are the changes needed? refactor code ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existed UT Closes #30922 from AngersZhuuuu/SPARK-33908. Authored-by: angerszhu <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
please note this breaks scala 2.13
|
Defined it by my self. I will fix this issue. Can you show how to reproduce this error, source code is |
Can you show how you reproduce this error? |
|
### What changes were proposed in this pull request? Fix UT according to #29966 (comment) Change StructType construct from ``` def inputSchema: StructType = StructType(StructField("inputColumn", LongType) :: Nil) ``` to ``` def inputSchema: StructType = new StructType().add("inputColumn", LongType) ``` The whole udf class is : ``` package org.apache.spark.examples.sql import org.apache.spark.sql.expressions.{MutableAggregationBuffer, UserDefinedAggregateFunction} import org.apache.spark.sql.types._ import org.apache.spark.sql.Row class Spark33084 extends UserDefinedAggregateFunction { // Data types of input arguments of this aggregate function def inputSchema: StructType = new StructType().add("inputColumn", LongType) // Data types of values in the aggregation buffer def bufferSchema: StructType = new StructType().add("sum", LongType).add("count", LongType) // The data type of the returned value def dataType: DataType = DoubleType // Whether this function always returns the same output on the identical input def deterministic: Boolean = true // Initializes the given aggregation buffer. The buffer itself is a `Row` that in addition to // standard methods like retrieving a value at an index (e.g., get(), getBoolean()), provides // the opportunity to update its values. Note that arrays and maps inside the buffer are still // immutable. def initialize(buffer: MutableAggregationBuffer): Unit = { buffer(0) = 0L buffer(1) = 0L } // Updates the given aggregation buffer `buffer` with new input data from `input` def update(buffer: MutableAggregationBuffer, input: Row): Unit = { if (!input.isNullAt(0)) { buffer(0) = buffer.getLong(0) + input.getLong(0) buffer(1) = buffer.getLong(1) + 1 } } // Merges two aggregation buffers and stores the updated buffer values back to `buffer1` def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = { buffer1(0) = buffer1.getLong(0) + buffer2.getLong(0) buffer1(1) = buffer1.getLong(1) + buffer2.getLong(1) } // Calculates the final result def evaluate(buffer: Row): Double = buffer.getLong(0).toDouble / buffer.getLong(1) } ``` ### Why are the changes needed? Fix UT for scala 2.13 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existed UT Closes #30980 from AngersZhuuuu/spark-33084-followup. Authored-by: angerszhu <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
|
||
import org.apache.spark.SparkFunSuite | ||
|
||
class DependencyUtilsSuite extends SparkFunSuite { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename file name to DependencyUtilsSuite.scala
?
|
||
val e3 = intercept[IllegalArgumentException] { | ||
DependencyUtils.resolveMavenDependencies( | ||
URI.create("ivy://org.apache.hive:hive-contrib:2.3.7?foo=")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep 2.3.7
consistent with the built-in Hive version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep
2.3.7
consistent with the built-in Hive version?
Emmm, is there any concern about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep
2.3.7
consistent with the built-in Hive version?
How about #31118
### What changes were proposed in this pull request? According to #29966 (comment) Use wrong name about suite file, this pr to fix this problem. And change to use some fake ivy link for this test ### Why are the changes needed? Follow file name rule ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No Closes #31118 from AngersZhuuuu/SPARK-33084-FOLLOW-UP. Authored-by: angerszhu <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
### What changes were proposed in this pull request? According to apache/spark#29966 (comment) Use wrong name about suite file, this pr to fix this problem. And change to use some fake ivy link for this test ### Why are the changes needed? Follow file name rule ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No Closes #31118 from AngersZhuuuu/SPARK-33084-FOLLOW-UP. Authored-by: angerszhu <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
… with Hive transitive behavior ### What changes were proposed in this pull request? SPARK-33084 added the ability to use ivy coordinates with `SparkContext.addJar`. PR #29966 claims to mimic Hive behavior although I found a few cases where it doesn't 1) The default value of the transitive parameter is false, both in case of parameter not being specified in coordinate or parameter value being invalid. The Hive behavior is that transitive is [true if not specified](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L169) in the coordinate and [false for invalid values](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L124). Also, regardless of Hive, I think a default of true for the transitive parameter also matches [ivy's own defaults](https://ant.apache.org/ivy/history/2.5.0/ivyfile/dependency.html#_attributes). 2) The parameter value for transitive parameter is regarded as case-sensitive [based on the understanding](#29966 (comment)) that Hive behavior is case-sensitive. However, this is not correct, Hive [treats the parameter value case-insensitively](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L122). I propose that we be compatible with Hive for these behaviors ### Why are the changes needed? To make `ADD JAR` with ivy coordinates compatible with Hive's transitive behavior ### Does this PR introduce _any_ user-facing change? The user-facing changes here are within master as the feature introduced in SPARK-33084 has not been released yet 1. Previously an ivy coordinate without `transitive` parameter specified did not resolve transitive dependency, now it does. 2. Previously an `transitive` parameter value was treated case-sensitively. e.g. `transitive=TRUE` would be treated as false as it did not match exactly `true`. Now it will be treated case-insensitively. ### How was this patch tested? Modified existing unit tests to test new behavior Add new unit test to cover usage of `exclude` with unspecified `transitive` Closes #31623 from shardulm94/spark-34506. Authored-by: Shardul Mahadik <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
… SQLQuerySuite to avoid clearing ivy.home ### What changes were proposed in this pull request? Add the `ResetSystemProperties` trait to `SQLQuerySuite` so that system property changes made by any of the tests will not affect other suites/tests. Specifically, the system property changes made by `SPARK-33084: Add jar support Ivy URI in SQL -- jar contains udf class` are targeted here (which sets and then clears `ivy.home`). ### Why are the changes needed? PR #29966 added a new test case that adjusts the `ivy.home` system property to force Ivy to resolve an artifact from a custom location. At the end of the test, the value is cleared. Clearing the value meant that, if a custom value of `ivy.home` was configured externally, it would not apply for tests run after this test case. ### Does this PR introduce _any_ user-facing change? No, this is only in tests. ### How was this patch tested? Existing unit tests continue to pass, whether or not `spark.jars.ivySettings` is configured (which adjusts the behavior of Ivy w.r.t. handling of `ivy.home` and `ivy.default.ivy.user.dir` properties). Closes #31694 from xkrogen/xkrogen-SPARK-33084-ivyhome-sysprop-followon. Authored-by: Erik Krogen <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
Support add jar with ivy path Since submit app can support ivy, add jar we can also support ivy now. User can add jar with sql like ``` add jar ivy:://group:artifict:version?exclude=xxx,xxx&transitive=true add jar ivy:://group:artifict:version?exclude=xxx,xxx&transitive=false ``` core api ``` sparkContext.addJar("ivy:://group:artifict:version?exclude=xxx,xxx&transitive=true") sparkContext.addJar("ivy:://group:artifict:version?exclude=xxx,xxx&transitive=false") ``` data:image/s3,"s3://crabby-images/d9449/d94491b7880b95d57ad6e66e4de7dea7ade69db2" alt="image" Added UT Closes apache#29966 from AngersZhuuuu/support-add-jar-ivy. Lead-authored-by: angerszhu <[email protected]> Co-authored-by: AngersZhuuuu <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
… with Hive transitive behavior ### What changes were proposed in this pull request? SPARK-33084 added the ability to use ivy coordinates with `SparkContext.addJar`. PR apache#29966 claims to mimic Hive behavior although I found a few cases where it doesn't 1) The default value of the transitive parameter is false, both in case of parameter not being specified in coordinate or parameter value being invalid. The Hive behavior is that transitive is [true if not specified](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L169) in the coordinate and [false for invalid values](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L124). Also, regardless of Hive, I think a default of true for the transitive parameter also matches [ivy's own defaults](https://ant.apache.org/ivy/history/2.5.0/ivyfile/dependency.html#_attributes). 2) The parameter value for transitive parameter is regarded as case-sensitive [based on the understanding](apache#29966 (comment)) that Hive behavior is case-sensitive. However, this is not correct, Hive [treats the parameter value case-insensitively](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L122). I propose that we be compatible with Hive for these behaviors ### Why are the changes needed? To make `ADD JAR` with ivy coordinates compatible with Hive's transitive behavior ### Does this PR introduce _any_ user-facing change? The user-facing changes here are within master as the feature introduced in SPARK-33084 has not been released yet 1. Previously an ivy coordinate without `transitive` parameter specified did not resolve transitive dependency, now it does. 2. Previously an `transitive` parameter value was treated case-sensitively. e.g. `transitive=TRUE` would be treated as false as it did not match exactly `true`. Now it will be treated case-insensitively. ### How was this patch tested? Modified existing unit tests to test new behavior Add new unit test to cover usage of `exclude` with unspecified `transitive` Closes apache#31623 from shardulm94/spark-34506. Authored-by: Shardul Mahadik <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
What changes were proposed in this pull request?
Support add jar with ivy path
Why are the changes needed?
Since submit app can support ivy, add jar we can also support ivy now.
Does this PR introduce any user-facing change?
User can add jar with sql like
core api
Doc Update snapshot
How was this patch tested?
Added UT