Skip to content

Commit

Permalink
ORC-1430: Use Hadoop 3.3.5 shaded clients
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

Currently, Apache ORC project uses three properties.
```
    <hadoop.version>2.7.3</hadoop.version>
    <min.hadoop.version>2.7.3</min.hadoop.version>
    <tools.hadoop.version>2.7.3</tools.hadoop.version>
```

This aims the following for Apache ORC 2.0.0.
1. Use Hadoop 3.3.5 shaded clients.
2. Remove `min.hadoop.version` and `tools.hadoop.version` in favor of `hadoop.version`
3. Ban non-shaded clients from now.
```
<bannedDependencies>
  <excludes>
    <exclude>org.apache.hadoop:hadoop-common</exclude>
    <exclude>org.apache.hadoop:hadoop-hdfs-client</exclude>
    <exclude>org.apache.hadoop:hadoop-mapreduce-client-core</exclude>
    <exclude>org.apache.hadoop:hadoop-mapreduce-client-jobclient</exclude>
  </excludes>
  <searchTransitive>true</searchTransitive>
</bannedDependencies>
```

Note that all changes are `pom.xml` files. There is no code change.

### Why are the changes needed?

- Hadoop 3's shaded client removes lots of complexity from the downstream clients.
- It's stable because Apache Spark community has been using Hadoop 3's shaded client from Apache Spark 3.2.0 (October 13, 2021) via https://issues.apache.org/jira/browse/SPARK-33212.

### How was this patch tested?

Pass the CIs.

Also, I validated there is no side-effect at Apache Spark side. The following is the change set when Apache Spark upgrades from Apache ORC 1.8.3 (AS-IS) to Apache ORC 2.0.0-SNAPSHOT.

```
-aircompressor/0.21//aircompressor-0.21.jar
+aircompressor/0.24//aircompressor-0.24.jar
-orc-core/1.8.3/shaded-protobuf/orc-core-1.8.3-shaded-protobuf.jar
-orc-mapreduce/1.8.3/shaded-protobuf/orc-mapreduce-1.8.3-shaded-protobuf.jar
-orc-shims/1.8.3//orc-shims-1.8.3.jar
+orc-core/2.0.0-SNAPSHOT/shaded-protobuf/orc-core-2.0.0-SNAPSHOT-shaded-protobuf.jar
+orc-mapreduce/2.0.0-SNAPSHOT/shaded-protobuf/orc-mapreduce-2.0.0-SNAPSHOT-shaded-protobuf.jar
+orc-shims/2.0.0-SNAPSHOT//orc-shims-2.0.0-SNAPSHOT.jar
```

Closes apache#1509 from dongjoon-hyun/ORC-1430.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
dongjoon-hyun authored and cxzl25 committed Jan 11, 2024
1 parent 1b1366d commit ca22c15
Show file tree
Hide file tree
Showing 10 changed files with 34 additions and 447 deletions.
12 changes: 0 additions & 12 deletions java/bench/core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -55,18 +55,6 @@
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-storage-api</artifactId>
Expand Down
24 changes: 12 additions & 12 deletions java/bench/hive/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -47,26 +47,26 @@
<groupId>org.apache.avro</groupId>
<artifactId>avro-mapred</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<classifier>core</classifier>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-registry</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-serde</artifactId>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
Expand Down
69 changes: 0 additions & 69 deletions java/bench/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@

<properties>
<avro.version>1.11.1</avro.version>
<hadoop.version>3.3.5</hadoop.version>
<hive.version>3.1.3</hive.version>
<jmh.version>1.20</jmh.version>
<junit.version>5.9.3</junit.version>
Expand Down Expand Up @@ -109,74 +108,6 @@
<artifactId>commons-csv</artifactId>
<version>1.10.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-server</artifactId>
</exclusion>
<exclusion>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-core</artifactId>
</exclusion>
<exclusion>
<groupId>commons-beanutils</groupId>
<artifactId>commons-beanutils</artifactId>
</exclusion>
<exclusion>
<groupId>commons-beanutils</groupId>
<artifactId>commons-beanutils-core</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
<exclusion>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
</exclusion>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
<scope>runtime</scope>
<exclusions>
<exclusion>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-core</artifactId>
</exclusion>
<exclusion>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-server</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.fusesource.leveldbjni</groupId>
<artifactId>leveldbjni-all</artifactId>
</exclusion>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
Expand Down
12 changes: 0 additions & 12 deletions java/bench/spark/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -52,18 +52,6 @@
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-storage-api</artifactId>
Expand Down
30 changes: 3 additions & 27 deletions java/core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,12 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<artifactId>hadoop-client-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<artifactId>hadoop-client-runtime</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
Expand Down Expand Up @@ -156,16 +157,11 @@
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<ignoredUnusedDeclaredDependencies>
<ignoredUnusedDeclaredDependency>org.apache.hadoop:hadoop-hdfs</ignoredUnusedDeclaredDependency>
</ignoredUnusedDeclaredDependencies>
<ignoredUsedUndeclaredDependencies>
<ignoredUsedUndeclaredDependency>com.google.auto.service:auto-service-annotations</ignoredUsedUndeclaredDependency>
</ignoredUsedUndeclaredDependencies>
<ignoredDependencies>
<ignoredDependency>org.apache.hive:hive-storage-api</ignoredDependency>
<ignoredDependency>org.apache.hadoop:hadoop-client-api</ignoredDependency>
<ignoredDependency>org.apache.hadoop:hadoop-client-runtime</ignoredDependency>
<ignoredDependency>com.google.auto.service:auto-service</ignoredDependency>
</ignoredDependencies>
</configuration>
Expand All @@ -182,25 +178,5 @@
<directory>${build.dir}/core</directory>
</build>
</profile>
<profile>
<id>java17</id>
<activation>
<jdk>[17,)</jdk>
</activation>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client-api</artifactId>
<version>${hadoop.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client-runtime</artifactId>
<version>${hadoop.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
</profile>
</profiles>
</project>
17 changes: 1 addition & 16 deletions java/examples/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -51,12 +51,7 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<artifactId>hadoop-client-api</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
Expand Down Expand Up @@ -111,16 +106,6 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<ignoredUnusedDeclaredDependencies>
<ignoredUnusedDeclaredDependency>com.google.guava:guava</ignoredUnusedDeclaredDependency>
<ignoredUnusedDeclaredDependency>org.apache.hadoop:hadoop-hdfs</ignoredUnusedDeclaredDependency>
<ignoredUnusedDeclaredDependency>org.apache.hadoop:hadoop-common</ignoredUnusedDeclaredDependency>
</ignoredUnusedDeclaredDependencies>
<ignoredDependencies>
<ignoredDependency>org.apache.hadoop:hadoop-client-api</ignoredDependency>
</ignoredDependencies>
</configuration>
</plugin>
</plugins>
<sourceDirectory>${basedir}/src/java</sourceDirectory>
Expand Down
45 changes: 2 additions & 43 deletions java/mapreduce/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,11 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<artifactId>hadoop-client-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<artifactId>hadoop-client-runtime</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
Expand All @@ -66,17 +66,6 @@
<artifactId>slf4j-api</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${min.hadoop.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
Expand Down Expand Up @@ -119,16 +108,6 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<ignoredUnusedDeclaredDependencies>
<ignoredUnusedDeclaredDependency>org.apache.hadoop:hadoop-hdfs</ignoredUnusedDeclaredDependency>
<ignoredUnusedDeclaredDependency>org.apache.hadoop:hadoop-mapreduce-client-jobclient</ignoredUnusedDeclaredDependency>
</ignoredUnusedDeclaredDependencies>
<ignoredDependencies>
<ignoredDependency>org.apache.hadoop:hadoop-client-api</ignoredDependency>
<ignoredDependency>org.apache.hadoop:hadoop-client-runtime</ignoredDependency>
</ignoredDependencies>
</configuration>
</plugin>
</plugins>
<sourceDirectory>${basedir}/src/java</sourceDirectory>
Expand All @@ -142,25 +121,5 @@
<directory>${build.dir}/mapreduce</directory>
</build>
</profile>
<profile>
<id>java17</id>
<activation>
<jdk>[17,)</jdk>
</activation>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client-api</artifactId>
<version>${hadoop.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client-runtime</artifactId>
<version>${hadoop.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
</profile>
</profiles>
</project>
Loading

0 comments on commit ca22c15

Please sign in to comment.