Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

making al base image updates to pelican export image #88

Merged
merged 12 commits into from
Dec 5, 2024
Merged
88 changes: 54 additions & 34 deletions export.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,23 +1,54 @@
FROM quay.io/cdis/python:python3.9-buster-2.0.0
ARG AZLINUX_BASE_VERSION=master

# Base stage with python-build-base
FROM quay.io/cdis/python-build-base:${AZLINUX_BASE_VERSION} AS base

ENV appname=pelican

ENV DEBIAN_FRONTEND=noninteractive
# create gen3 user
# Create a group 'gen3' with GID 1000 and a user 'gen3' with UID 1000
RUN groupadd -g 1000 gen3 && \
useradd -m -s /bin/bash -u 1000 -g gen3 gen3

# Install pipx
RUN python3 -m pip install pipx && \
python3 -m pipx ensurepath

USER gen3
# Install Poetry via pipx
RUN pipx install poetry
ENV PATH="/home/gen3/.local/bin:${PATH}"
USER root

WORKDIR /${appname}

# Builder stage
FROM base AS builder

RUN dnf update && dnf install -y \
python3-devel \
gcc \
postgresql-devel

COPY . /${appname}

# cache so that poetry install will run if these files change
COPY poetry.lock pyproject.toml /${appname}/

RUN poetry install -vv --no-interaction --without dev

# Final stage
FROM base

#RUN mkdir -p /usr/share/man/man1
#RUN mkdir -p /usr/share/man/man7
COPY --from=builder /venv /venv
COPY --from=builder /${appname} /${appname}

RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
libgnutls30 \
openjdk-11-jre-headless \
# dependency for pyscopg2
libpq-dev \
postgresql-client \
RUN dnf update && dnf install -y \
wget \
unzip \
g++ \
&& rm -rf /var/lib/apt/lists/*
tar \
java-11-amazon-corretto \
gnutls \
&& rm -rf /var/cache/yum

ENV HADOOP_VERSION="3.2.1"
ENV HADOOP_HOME="/hadoop" \
Expand All @@ -27,7 +58,8 @@
&& mkdir -p $HADOOP_HOME \
&& tar -xvf hadoop-${HADOOP_VERSION}.tar.gz -C ${HADOOP_HOME} --strip-components 1 \
&& rm hadoop-${HADOOP_VERSION}.tar.gz \
&& rm -rf $HADOOP_HOME/share/doc
&& rm -rf $HADOOP_HOME/share/doc \
&& chown -R gen3:gen3 $HADOOP_HOME

ENV SQOOP_VERSION="1.4.7"
ENV SQOOP_HOME="/sqoop" \
Expand All @@ -39,16 +71,17 @@
&& mkdir -p $SQOOP_HOME \
&& tar -xvf sqoop-${SQOOP_VERSION}.bin__hadoop-2.6.0.tar.gz -C ${SQOOP_HOME} --strip-components 1 \
&& rm sqoop-${SQOOP_VERSION}.bin__hadoop-2.6.0.tar.gz \
&& rm -rf $SQOOP_HOME/docs
&& rm -rf $SQOOP_HOME/docs \
&& chown -R gen3:gen3 $SQOOP_HOME

ENV POSTGRES_JAR_VERSION="42.2.9"
ENV POSTGRES_JAR_URL="https://jdbc.postgresql.org/download/postgresql-${POSTGRES_JAR_VERSION}.jar" \
POSTGRES_JAR_PATH=$SQOOP_HOME/lib/postgresql-${POSTGRES_JAR_VERSION}.jar \
JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64"
JAVA_HOME="/usr/lib/jvm/java-11-amazon-corretto"

RUN wget ${POSTGRES_JAR_URL} -O ${POSTGRES_JAR_PATH}

ENV HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop" \

Check warning on line 84 in export.Dockerfile

View workflow job for this annotation

GitHub Actions / Build and Push Pelican Export Image / Build Image and Push

Variables should be defined before their use

UndefinedVar: Usage of undefined variable '$LD_LIBRARY_PATH' More info: https://docs.docker.com/go/dockerfile/rule/undefined-var/
HADOOP_MAPRED_HOME="${HADOOP_HOME}" \
HADOOP_COMMON_HOME="${HADOOP_HOME}" \
HADOOP_HDFS_HOME="${HADOOP_HOME}" \
Expand All @@ -63,26 +96,13 @@

RUN mkdir -p $ACCUMULO_HOME $HIVE_HOME $HBASE_HOME $HCAT_HOME $ZOOKEEPER_HOME

ENV PATH=${SQOOP_HOME}/bin:${HADOOP_HOME}/sbin:$HADOOP_HOME/bin:${JAVA_HOME}/bin:${PATH}

WORKDIR /pelican

RUN pip install --upgrade pip
RUN chown -R gen3:gen3 $ACCUMULO_HOME $HIVE_HOME $HBASE_HOME $HCAT_HOME $ZOOKEEPER_HOME $JAVA_HOME $POSTGRES_JAR_PATH

# install poetry
RUN pip install --upgrade "poetry<1.2"

COPY . /$appname
WORKDIR /$appname

# cache so that poetry install will run if these files change
COPY poetry.lock pyproject.toml /$appname/
ENV PATH=${SQOOP_HOME}/bin:${HADOOP_HOME}/sbin:$HADOOP_HOME/bin:${JAVA_HOME}/bin:${PATH}

# install package and dependencies via poetry
RUN poetry config virtualenvs.create false \
&& poetry install -vv --no-dev --no-interaction \
&& poetry show -v
# Switch to non-root user 'gen3' for the serving process
USER gen3

ENV PYTHONUNBUFFERED=1

ENTRYPOINT poetry run python job_export.py

Check warning on line 108 in export.Dockerfile

View workflow job for this annotation

GitHub Actions / Build and Push Pelican Export Image / Build Image and Push

JSON arguments recommended for ENTRYPOINT/CMD to prevent unintended behavior related to OS signals

JSONArgsRecommended: JSON arguments recommended for ENTRYPOINT to prevent unintended behavior related to OS signals More info: https://docs.docker.com/go/dockerfile/rule/json-args-recommended/
89 changes: 54 additions & 35 deletions import.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,23 +1,54 @@
FROM quay.io/cdis/python:python3.9-buster-2.0.0
ARG AZLINUX_BASE_VERSION=master

# Base stage with python-build-base
FROM quay.io/cdis/python-build-base:${AZLINUX_BASE_VERSION} AS base

ENV appname=pelican

ENV DEBIAN_FRONTEND=noninteractive
# create gen3 user
# Create a group 'gen3' with GID 1000 and a user 'gen3' with UID 1000
RUN groupadd -g 1000 gen3 && \
useradd -m -s /bin/bash -u 1000 -g gen3 gen3

# Install pipx
RUN python3 -m pip install pipx && \
python3 -m pipx ensurepath

USER gen3
# Install Poetry via pipx
RUN pipx install poetry
ENV PATH="/home/gen3/.local/bin:${PATH}"
USER root

WORKDIR /${appname}

# Builder stage
FROM base AS builder

RUN dnf update && dnf install -y \
python3-devel \
gcc \
postgresql-devel

COPY . /${appname}

# cache so that poetry install will run if these files change
COPY poetry.lock pyproject.toml /${appname}/

#RUN mkdir -p /usr/share/man/man1
#RUN mkdir -p /usr/share/man/man7
RUN poetry install -vv --no-interaction --without dev
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we went from

RUN poetry config virtualenvs.create false \
    && poetry install -vv --no-dev --no-interaction \
    && poetry show -v

to this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are trying to create consistency between all of our services and use poetry to manage virtual environments instead of python. With the new AL base image PRs, we thought it would be a good time to do this!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me!


RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
libgnutls30 \
openjdk-11-jre-headless \
# dependency for pyscopg2
libpq-dev \
postgresql-client \
# Final stage
FROM base

COPY --from=builder /venv /venv
COPY --from=builder /${appname} /${appname}

RUN dnf update && dnf install -y \
wget \
unzip \
g++ \
&& rm -rf /var/lib/apt/lists/*
tar \
java-11-amazon-corretto \
gnutls \
&& rm -rf /var/cache/yum

ENV HADOOP_VERSION="3.2.1"
ENV HADOOP_HOME="/hadoop" \
Expand All @@ -27,7 +58,8 @@
&& mkdir -p $HADOOP_HOME \
&& tar -xvf hadoop-${HADOOP_VERSION}.tar.gz -C ${HADOOP_HOME} --strip-components 1 \
&& rm hadoop-${HADOOP_VERSION}.tar.gz \
&& rm -rf $HADOOP_HOME/share/doc
&& rm -rf $HADOOP_HOME/share/doc \
&& chown -R gen3:gen3 $HADOOP_HOME

ENV SQOOP_VERSION="1.4.7"
ENV SQOOP_HOME="/sqoop" \
Expand All @@ -39,16 +71,17 @@
&& mkdir -p $SQOOP_HOME \
&& tar -xvf sqoop-${SQOOP_VERSION}.bin__hadoop-2.6.0.tar.gz -C ${SQOOP_HOME} --strip-components 1 \
&& rm sqoop-${SQOOP_VERSION}.bin__hadoop-2.6.0.tar.gz \
&& rm -rf $SQOOP_HOME/docs
&& rm -rf $SQOOP_HOME/docs \
&& chown -R gen3:gen3 $SQOOP_HOME

ENV POSTGRES_JAR_VERSION="42.2.9"
ENV POSTGRES_JAR_URL="https://jdbc.postgresql.org/download/postgresql-${POSTGRES_JAR_VERSION}.jar" \
POSTGRES_JAR_PATH=$SQOOP_HOME/lib/postgresql-${POSTGRES_JAR_VERSION}.jar \
JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64"
JAVA_HOME="/usr/lib/jvm/java-11-amazon-corretto"

RUN wget ${POSTGRES_JAR_URL} -O ${POSTGRES_JAR_PATH}

ENV HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop" \

Check warning on line 84 in import.Dockerfile

View workflow job for this annotation

GitHub Actions / Build and Push Pelican Import Image / Build Image and Push

Variables should be defined before their use

UndefinedVar: Usage of undefined variable '$LD_LIBRARY_PATH' More info: https://docs.docker.com/go/dockerfile/rule/undefined-var/
HADOOP_MAPRED_HOME="${HADOOP_HOME}" \
HADOOP_COMMON_HOME="${HADOOP_HOME}" \
HADOOP_HDFS_HOME="${HADOOP_HOME}" \
Expand All @@ -63,27 +96,13 @@

RUN mkdir -p $ACCUMULO_HOME $HIVE_HOME $HBASE_HOME $HCAT_HOME $ZOOKEEPER_HOME

ENV PATH=${SQOOP_HOME}/bin:${HADOOP_HOME}/sbin:$HADOOP_HOME/bin:${JAVA_HOME}/bin:${PATH}

WORKDIR /pelican
RUN chown -R gen3:gen3 $ACCUMULO_HOME $HIVE_HOME $HBASE_HOME $HCAT_HOME $ZOOKEEPER_HOME $JAVA_HOME $POSTGRES_JAR_PATH

RUN pip install --upgrade pip

# install poetry
RUN pip install --upgrade "poetry<1.2"

COPY . /$appname
WORKDIR /$appname

# copy ONLY poetry artifact, install the dependencies but not fence
# this will make sure than the dependencies is cached
COPY poetry.lock pyproject.toml /$appname/
ENV PATH=${SQOOP_HOME}/bin:${HADOOP_HOME}/sbin:$HADOOP_HOME/bin:${JAVA_HOME}/bin:${PATH}

# install package and dependencies via poetry
RUN poetry config virtualenvs.create false \
&& poetry install -vv --no-dev --no-interaction \
&& poetry show -v
# Switch to non-root user 'gen3' for the serving process
USER gen3

ENV PYTHONUNBUFFERED=1

ENTRYPOINT poetry run python job_import.py

Check warning on line 108 in import.Dockerfile

View workflow job for this annotation

GitHub Actions / Build and Push Pelican Import Image / Build Image and Push

JSON arguments recommended for ENTRYPOINT/CMD to prevent unintended behavior related to OS signals

JSONArgsRecommended: JSON arguments recommended for ENTRYPOINT to prevent unintended behavior related to OS signals More info: https://docs.docker.com/go/dockerfile/rule/json-args-recommended/
6 changes: 5 additions & 1 deletion job_export.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,11 @@

if access_format == "guid":
# calculate md5 sum
md5_sum = hashlib.md5()
md5 = (
hashlib.md5()
if sys.version_info < (3, 9)
else hashlib.md5(usedforsecurity=False)
) # nosec
chunk_size = 8192
with open(fname, "rb") as f:
while True:
Expand Down
4 changes: 2 additions & 2 deletions pelican/dictionary.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
def init_dictionary(url):
d = DataDictionary(url=url)
dictionary.init(d)
# the gdcdatamodel expects dictionary initiated on load, so this can't be
# the gen3datamodel expects dictionary initiated on load, so this can't be
# imported on module level
from gdcdatamodel import models as md
from gen3datamodel import models as md

return d, md

Expand Down
Loading
Loading