Skip to content

Commit

Permalink
Merge pull request #41 from ucsd-ets/tf+tensorboard
Browse files Browse the repository at this point in the history
Merge tf+tensorboard branch.

1. tf & keras version issue
2. LD_LIBRARY_PATH
3. manifest bug fixed
  • Loading branch information
Thomaswang0822 authored Dec 23, 2022
2 parents 5198d3e + 3aabaad commit c11a915
Show file tree
Hide file tree
Showing 20 changed files with 238 additions and 65 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ jobs:
DOCKERHUB_USER: ${{ secrets.DOCKERHUB_USER }}
run: |
[ -f artifacts/IMAGES_BUILT ] && [ -s artifacts/IMAGES_BUILT ] && doit manifest || echo "no image updated"
[ -f manifests/*.md ] && cp manifests/*.md wiki/ || echo "no image updated"
ls manifests/*.md &> /dev/null && cp manifests/*.md wiki/ && echo "*.md copied to wiki" || echo "no *.md"
# if there is no image updated, skip doit
# if there is no *.md (no image updated), skip cp command

Expand Down
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,13 @@ For people who are trying to modify the image stack, here are some scenarios and
5. If the workflow completed successfully (a green check to the left of the commit), you can now safely merge the pull request to the `main` branch.
6. The workflow will now run again and push the images to Dockerhub.

### Running individual tests for images

1. Activate the virtual environment `source bin/activate`
2. `cd images`
3. `export TEST_IMAGE=MYIMAGE` replace `MYIMAGE` with your locally built image name
4. `pytest tests_common` as an example


### Overview of the Repository
We Use github workflow to builds new images if their is any change in the images or addtional images are added.
Expand Down
24 changes: 14 additions & 10 deletions images/datahub-base-notebook/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -47,22 +47,26 @@ USER jovyan
COPY start-code-server.sh /usr/local/bin/start-notebook.d/

ARG PY_VER_SHORT=3.9
ARG JUPYTERHUB_VERSION=1.4.2
# nbconvert downgrade needed for nbgrader to work
ARG JUPYTERHUB_VERSION=1.5.0

# mistune added for nbgrader issues

RUN /usr/share/datahub/scripts/install-python-all.sh && \
pip install pandas --upgrade && \
pip install nltk && \
pip install nbconvert==5.6.1 && \
pip install jupyterhub==$JUPYTERHUB_VERSION && \
pip install nbgrader==0.6.2 && \
pip install traitlets==5.1.0 && \
conda install -c conda-forge rise && \
cat /usr/share/datahub/scripts/canvas_exporter.py > /opt/conda/lib/python$PY_VER_SHORT/site-packages/nbgrader/plugins/export.py && \
pip install pandas 'mistune>=2' --upgrade && \
pip install nltk \
nbconvert==7.2.1 \
jupyterhub==$JUPYTERHUB_VERSION && \
conda install -c conda-forge rise -y && \
fix-permissions $CONDA_DIR && \
fix-permissions /home/$NB_USER && \
conda clean --all && \
chown -R jovyan:users /opt/conda/etc/jupyter/nbconfig && \
chmod -R +r /opt/conda/etc/jupyter/nbconfig

# nbgrader requires some variables set to just run the notebook server
ENV NBGRADER_COURSEID="NA"
ENV JUPYTERHUB_USER=${NB_USER}

# Install jupyterlab extensions
RUN pip install jupyterlab-github jupyterlab-latex jupyterlab-git \
jupyterlab-fasta jupyterlab-pullrequests jupyterlab-geojson
Expand Down

This file was deleted.

11 changes: 5 additions & 6 deletions images/datahub-base-notebook/scripts/install-python/install-nbgrader.sh
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
#!/bin/sh -x
pip install nbgrader==0.6.1
#!/bin/bash

pip install nbgrader==0.8.1

jupyter nbextension install --symlink --sys-prefix --py nbgrader
jupyter nbextension enable --sys-prefix --py nbgrader
jupyter serverextension enable --sys-prefix --py nbgrader

# Disable formgrader for default case, re-enable if instructor
jupyter nbextension disable --sys-prefix formgrader/main --section=tree
jupyter serverextension disable --sys-prefix nbgrader.server_extensions.formgrader
jupyter nbextension disable --sys-prefix create_assignment/main
jupyter labextension enable --level=system nbgrader
jupyter server extension enable --system --py nbgrader

This file was deleted.

7 changes: 0 additions & 7 deletions images/datascience-notebook/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,4 @@ RUN pip install dpkt \

RUN conda clean -tipsy

# import integration tests
# ENV TESTDIR=/usr/share/datahub/tests
# ARG DATASCIENCE_TESTDIR=${TESTDIR}/datascience-notebook
# COPY tests ${DATASCIENCE_TESTDIR}
# RUN chmod -R +rwx ${DATASCIENCE_TESTDIR}
# RUN chown 1000:1000 ${DATASCIENCE_TESTDIR}

ENV SHELL=/bin/bash
15 changes: 11 additions & 4 deletions images/scipy-ml-notebook/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ RUN ln -s libncurses.so.6 /usr/lib/x86_64-linux-gnu/libncurses.so.5
COPY run_jupyter.sh /
RUN chmod +x /run_jupyter.sh

COPY cudatoolkit_env_vars.sh /etc/datahub-profile.d/cudatoolkit_env_vars.sh
COPY cudnn_env_vars.sh /etc/datahub-profile.d/cudnn_env_vars.sh
COPY activate.sh /tmp/activate.sh

RUN chmod 777 /etc/datahub-profile.d/*.sh /tmp/activate.sh

USER jovyan

# CUDA 11
Expand All @@ -39,7 +45,7 @@ RUN pip install --no-cache-dir datascience \
opencv-python \
pycocotools \
pillow \
tensorflow && \
tensorflow==2.11.0 && \
fix-permissions $CONDA_DIR && \
fix-permissions /home/$NB_USER

Expand All @@ -57,6 +63,7 @@ RUN pip install torch==${TORCH_VER} torchvision==${TORCH_VIS_VER} torchaudio==${
RUN ln -s /usr/local/nvidia/bin/nvidia-smi /opt/conda/bin/nvidia-smi

USER $NB_UID:$NB_GID
ENV PATH=${PATH}:/usr/local/nvidia/bin
ENV LD_LIBRARY_PATH=/opt/conda/pkgs/cudatoolkit-11.2.2-he111cf0_8/lib/:${LD_LIBRARY_PATH}
RUN echo 'here'
ENV PATH=${PATH}:/usr/local/nvidia/bin:/opt/conda/bin

RUN . /tmp/activate.sh

5 changes: 5 additions & 0 deletions images/scipy-ml-notebook/activate.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash

for script in /etc/datahub-profile.d/*.sh; do
. "$script"
done
6 changes: 6 additions & 0 deletions images/scipy-ml-notebook/cudatoolkit_env_vars.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash

# set systemwide env variables for Conda package: cudnn
cudatoolkit=$(conda list 2> /dev/null | grep cudatoolkit | awk -F' ' '{ print $2 "-" $3 }')

export LD_LIBRARY_PATH=/opt/conda/pkgs/cudatoolkit-$cudatoolkit/lib:${LD_LIBRARY_PATH}
6 changes: 6 additions & 0 deletions images/scipy-ml-notebook/cudnn_env_vars.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash

# set systemwide env variables for Conda package: cudnn
cudnn_version=$(conda list 2> /dev/null | grep cudnn | awk -F' ' '{ print $2 "-" $3 }')

export LD_LIBRARY_PATH=/opt/conda/pkgs/cudnn-$cudnn_version/lib/:${LD_LIBRARY_PATH}
39 changes: 39 additions & 0 deletions images/scipy-ml-notebook/grpc_files/job-template.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
apiVersion: batch/v1
kind: Job
metadata:
name: gpu-test-job
namespace: test-server
spec:
ttlSecondsAfterFinished: 0
template:
spec:
containers:
- name: tensorflow-pytorch-tester
image:
resources:
limits:
cpu: "2"
memory: 8Gi
nvidia.com/gpu: "1"
requests:
cpu: "1"
memory: 8Gi
nvidia.com/gpu: "1"
# command: ["python", "/job-script/script.py"]
command: ["/bin/sh", "-c"]
args: ["for script in /etc/datahub-profile.d/*.sh; do . \"$script\"; done; python /job-script/script.py"]
env:
- name: TF_CPP_MIN_LOG_LEVEL
value: '3'
volumeMounts:
- mountPath: /job-script
name: job-script
volumes:
- name: job-script
configMap:
name: config
items:
- key: "script.py"
path: "script.py"
restartPolicy: Never
backoffLimit: 4
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ spec:
cpu: "1"
memory: 8Gi
nvidia.com/gpu: "1"
command: ["python", "/job-script/script.py"]
# command: ["python", "/job-script/script.py"]
command: ["/bin/sh", "-c"]
args: ["for script in /etc/datahub-profile.d/*.sh; do . \"$script\"; done; python /job-script/script.py"]
env:
- name: TF_CPP_MIN_LOG_LEVEL
value: '3'
Expand Down
77 changes: 77 additions & 0 deletions images/scipy-ml-notebook/test/data/test_tf.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "SB93Ge748VQs"
},
"source": [
"##### Check:\n",
"1. tensorboard extention can be loaded.\n",
"2. tf version and tf.keras version match"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6B95Hb6YVgPZ"
},
"outputs": [],
"source": [
"# Load the TensorBoard notebook extension\n",
"%load_ext tensorboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import tensorflow as tf"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"assert tf.__version__ == tf.keras.__version__, \\\n",
"f\"tensorflow {tf.__version__} and keras {tf.keras.__version__} versions don't match\""
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "get_started.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3.8.8 ('base')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"vscode": {
"interpreter": {
"hash": "d06a35f6432bcea124c520d36814be75a6dd5ed4335e0c829924d510b7f0b7dd"
}
}
},
"nbformat": 4,
"nbformat_minor": 1
}
17 changes: 7 additions & 10 deletions images/scipy-ml-notebook/test/test_gpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,16 @@ def untested_image():
tag = tag+'-untested'
return f'{repo}:{tag}'


# @pytest.mark.skip(reason="Skipping test_gpu_valid() to debug")
def test_gpu_valid(untested_image):
print("Skipping test_gpu_valid(untested_image) of scipy-ml.")
return
assert run(create_client(test_image=untested_image,cer_path=None)) == EXAMPLE_GOOD_OUT
assert run(create_client(test_image=untested_image,cer_path=None, timeout=3000)) == EXAMPLE_GOOD_OUT, \
f"Image name: {untested_image}, gpu is not available for torch or tensorflow"

# @pytest.mark.skip(reason="Skipping test_gpu_nonexistent_image() to debug")
def test_gpu_nonexistent_image():
print("Skipping test_gpu_nonexistent_image() of scipy-ml.")
return
response = json.loads(run(create_client(test_image='invalid_image_for_test',cer_path=None,timeout=20)))
assert response['test_output'] == EXAMPLE_TIME_OUT


@pytest.mark.skip(reason="Skipping test_gpu_lacking_tools() to debug; not necessary")
def test_gpu_lacking_tools():
print("Skipping test_gpu_lacking_tools() of scipy-ml.")
return
assert run(create_client(test_image='python',cer_path=None)) == EXAMPLE_MISSING_PACKAGE
46 changes: 46 additions & 0 deletions images/scipy-ml-notebook/test/test_tf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Adapted from datascience-notebook/test/test_notebooks.py
import logging

import pytest
import os

LOGGER = logging.getLogger(__name__)
THIS_DIR = os.path.dirname(os.path.realpath(__file__))


@pytest.mark.parametrize(
"test_file",
["test_tf"],
)
def test_tf(container, test_file):
""" test_tf.ipynb checks
1. tensorboard extention can be loaded.
2. tf version and tf.keras version match
"""
host_data_dir = os.path.join(THIS_DIR, "data")
cont_data_dir = "/home/jovyan/data"
output_dir = "/tmp"
timeout_sec = 600
LOGGER.info(f"Test that {test_file} notebook can be executed ...")
command = (
"jupyter nbconvert --to markdown "
+ f"--ExecutePreprocessor.timeout={timeout_sec} "
+ f"--output-dir {output_dir} "
+ f"--execute {cont_data_dir}/{test_file}.ipynb"
)

""" container.ports.update({
"5132/tcp": 5132
}) """

c = container.run(
volumes={host_data_dir: {"bind": cont_data_dir, "mode": "ro"}},
tty=True,
command=["start.sh", "bash", "-c", command],
)

rv = c.wait(timeout=timeout_sec//10 + 10)
logs = c.logs(stdout=True).decode("utf-8")
LOGGER.debug(logs)
print(logs)
assert rv == 0 or rv["StatusCode"] == 0, f"Command {command} failed"
2 changes: 1 addition & 1 deletion images/spec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ images:

plans:
PYTHON39:
tag_prefix: '2022.3'
tag_prefix: '2023.1'
tag_stable_postfix: -stable

manifests:
Expand Down
7 changes: 4 additions & 3 deletions images/tests_common/test_packages.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,11 +138,12 @@ def _import_packages(package_helper, filtered_packages, check_function, max_fail
LOGGER.info("Testing the import of packages ...")
for package in filtered_packages:
LOGGER.info(f"Trying to import {package}")
all_packages[package]=''
try:
status_code = check_function(package_helper, package)
all_packages[package] = status_code
assert (
check_function(package_helper, package) == 0
), f"Package [{package}] import failed"
status_code == 0
), f"Package [{package}] import failed. status_code = {status_code}"
except AssertionError as err:
failures[package] = err
if len(failures) > max_failures:
Expand Down
Loading

0 comments on commit c11a915

Please sign in to comment.