Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(hive): add apache hive package #29412

Closed
wants to merge 15 commits into from
Closed

Conversation

maxgio92
Copy link
Member

@maxgio92 maxgio92 commented Sep 27, 2024

This PR introduces the package for Apache Hive.

Related: #29167

Pre-review Checklist

For new package PRs only

  • REQUIRED - The package is available under an OSI-approved or FSF-approved license
  • REQUIRED - The version of the package is still receiving security updates
  • This PR links to the upstream project's support policy (e.g. endoflife.date):
    There's no support policy available.

@maxgio92 maxgio92 force-pushed the apache-hive branch 19 times, most recently from 71958ea to 625509d Compare October 3, 2024 17:52

- name: Prepare HDFS directories
runs: |
echo "test:x:$(id -u):$(id -g):test user:/:/bin/sh:" >> /etc/passwd
Copy link
Member Author

@maxgio92 maxgio92 Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tweak is there because Hive uses the user name from the passwd DB to set up the HDFS ownership.

hive.yaml Outdated
LANG: en_US.UTF-8
JAVA_HOME: /usr/lib/jvm/java-1.8-openjdk
HIVE_VERSION: 4.0.0
HADOOP_VERSION: 3.3.6
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hadoop 3.4.0 is not yet supported. Work to support it is in progress (apache/hive#5187).

Signed-off-by: Massimiliano Giovagnoli <[email protected]>
Signed-off-by: Massimiliano Giovagnoli <[email protected]>
Signed-off-by: Massimiliano Giovagnoli <[email protected]>
This reverts commit c759e38.

Signed-off-by: Massimiliano Giovagnoli <[email protected]>
Signed-off-by: Massimiliano Giovagnoli <[email protected]>
@maxgio92 maxgio92 marked this pull request as ready for review October 3, 2024 18:22
hive.yaml Outdated
LANG: en_US.UTF-8
JAVA_HOME: /usr/lib/jvm/java-1.8-openjdk
HIVE_VERSION: 4.0.0
HADOOP_VERSION: 3.3.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we need to explicitly set the version here, when we package 3.3 could we make sure we're pinned to this version so that there isn't drift?

I.E.,

dependencies:
  runtime:
    - hadoop=3.3.6

Same for Tez

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally makes sense, thanks @EyeCantCU.

Copy link
Member

@smoser smoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor things.

hive.yaml Outdated
LANG: en_US.UTF-8
JAVA_HOME: /usr/lib/jvm/java-1.8-openjdk
HIVE_VERSION: 4.0.0
HADOOP_VERSION: 3.3.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can replace this environment variable HADOOP_VERSION with a melange variable.

vars:
   hadoop-version: 3.3.6

then you use ${{vars.hadoop-version}} and you don't have to repeat yourself in the runtime environment below (if they are intended to be the same thing).

hive.yaml Outdated
pipeline:
- name: Download Hadoop
runs: |
gpg --keyserver hkps://keyserver.ubuntu.com --recv-key CD32D773FF41C3F9E74BDB7FB362E1C021854B9D
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that using fetch pipeline would be preferable here.

it will need updating any time HADOOP_VERSION needs updating (versus the gpg path that would accept it if it was signed), but fetch is used in more places, so we can improve that pipeline for any reason and get benefit everywhere.

then you can drop the gpg, gpg-agent, gnupg-dirmngr and curl from above.

test:
  pipeline:
    - uses: fetch
      with:
        uri: https://dlcdn.apache.org/hadoop/common/hadoop-${{vars.hadoop-version}}/hadoop-${{vars.hadoop-version}}.tar.gz
        expected-sha256: f5195059c0d4102adaa7fff17f7b2a85df906bcb6e19948716319f9978641a04

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! Thank you @smoser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants