Improve MacOS support and pin tensorflow version during testing (#383)

* Improve MacOS support * Conditionally import tensorflow_text everywhere * Use requirements files for continuous testing * Fix logs * Bug fixes and improvement for linux testing * Typo fix * Address review comments
keras-team · Oct 12, 2022 · b197f85 · b197f85
1 parent f43039d
commit b197f85
Show file tree

Hide file tree

Showing 24 changed files with 228 additions and 75 deletions.
diff --git a/.cloudbuild/requirements.txt b/.cloudbuild/requirements.txt
diff --git a/.github/workflows/actions.yml b/.github/workflows/actions.yml
@@ -29,7 +29,8 @@ jobs:
             ${{ runner.os }}-pip-
       - name: Install dependencies
         run: |
-          pip install -e ".[tests]" --progress-bar off --upgrade
+          pip install -r requirements.txt --progress-bar off
+          pip install -e "." --progress-bar off
       - name: Test with pytest
         run: |
           pytest --cov=keras_nlp --cov-report xml:coverage.xml
@@ -56,6 +57,7 @@ jobs:
             ${{ runner.os }}-pip-
       - name: Install dependencies
         run: |
-          pip install -e ".[tests]" --progress-bar off --upgrade
+          pip install -r requirements.txt --progress-bar off
+          pip install -e "." --progress-bar off
       - name: Lint
         run: bash shell/lint.sh
diff --git a/.github/workflows/nightly.yml b/.github/workflows/nightly.yml
@@ -30,12 +30,8 @@ jobs:
             ${{ runner.os }}-pip-
       - name: Install dependencies
         run: |
-          pip install -e ".[tests]" --progress-bar off --upgrade
-          pip uninstall keras -y
-          pip uninstall tensorflow -y
-          pip uninstall tensorflow_text -y
-          pip install tf-nightly --progress-bar off --upgrade
-          pip install tensorflow-text-nightly --progress-bar off --upgrade
+          pip install -r requirements-nightly.txt --progress-bar off
+          pip install -e "." --progress-bar off
       - name: Test with pytest
         run: |
           pytest --cov=keras_nlp --cov-report xml:coverage.xml
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -84,25 +84,90 @@ Once the pull request is approved, a team member will take care of merging.
 Python 3.7 or later is required.
 
 Setting up your KerasNLP development environment requires you to fork the
-KerasNLP repository, clone the repository, create a virtual environment, and
-install dependencies.
-
-You can achieve this by running the following commands:
+KerasNLP repository and clone it locally. With the
+[GitHub CLI](https://github.com/cli/cli) installed, you can do this as follows:
 
 ```shell
 gh repo fork keras-team/keras-nlp --clone --remote
 cd keras-nlp
-python -m venv ~/keras-nlp-venv
-source ~/keras-nlp-venv/bin/activate
-pip install -e ".[tests]"
 ```
 
-The first line relies on having an installation of
-[the GitHub CLI](https://github.com/cli/cli).
+Next we must setup a python environment with the correct dependencies. We
+recommend using `conda` to install tensorflow dependencies (such as CUDA), and
+`pip` to install python packages from PyPI. The exact method will depend on your
+OS.
+
+### Linux (recommended)
+
+To setup a complete environment with TensorFlow, a local install of keras-nlp,
+and all development tools, run the following or adapt it to suit your needs.
+
+```shell
+# Create and activate conda environment.
+conda create -n keras-nlp python=3.9
+conda activate keras-nlp
+
+# The following can be omitted if GPU support is not required.
+conda install -c conda-forge cudatoolkit-dev=11.2 cudnn=8.1.0
+mkdir -p $CONDA_PREFIX/etc/conda/activate.d/
+echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
+echo 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
+source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
+
+# Install dependencies.
+python -m pip install --upgrade pip
+python -m pip install -r requirements.txt
+python -m pip install -e "."
+```
+
+### MacOS
+
+⚠️⚠️⚠️ MacOS binaries are for the M1 architecture are not currently available from
+official sources. You can try experimental development workflow leveraging the
+[tensorflow metal plugin](https://developer.apple.com/metal/tensorflow-plugin/)
+and a [community maintained build](https://github.com/sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon)
+of `tensorflow-text`. These binaries are not provided by Google, so proceed at
+your own risk.
+
+#### Experimental instructions for Arm (M1)
+
+```shell
+# Create and activate conda environment.
+conda create -n keras-nlp python=3.9
+conda activate keras-nlp
+
+# Install dependencies.
+conda install -c apple tensorflow-deps=2.9
+python -m pip install --upgrade pip
+python -m pip install -r requirements-macos-m1.txt
+python -m pip install -e "."
+```
 
-Following these commands you should be able to run the tests using
-`pytest keras_nlp`. Please report any issues running tests following these
-steps.
+#### Instructions for x86 (Intel)
+
+```shell
+# Create and activate conda environment.
+conda create -n keras-nlp python=3.9
+conda activate keras-nlp
+
+# Install dependencies.
+python -m pip install --upgrade pip
+python -m pip install -r requirements.txt
+python -m pip install -e "."
+```
+
+### Windows
+
+For the best experience developing on windows, please install
+[WSL](https://learn.microsoft.com/en-us/windows/wsl/install), and proceed with
+the linux installation instruction above.
+
+To run the format and lint scripts, make sure you clone the repo with Linux
+style line endings and change any line separator settings in your editor.
+This is automatically done if you clone using git inside WSL.
+
+Note that will not support Windows Shell/PowerShell for any scripts in this
+repository.
 
 ## Testing changes
 
@@ -150,18 +215,3 @@ the following commands manually every time you want to format your code:
 If after running these the CI flow is still failing, try updating `flake8`,
 `isort` and `black`. This can be done by running `pip install --upgrade black`,
 `pip install --upgrade flake8`, and `pip install --upgrade isort`.
-
-## Developing on Windows
-
-For Windows development, we recommend using WSL (Windows Subsystem for Linux),
-so you can run the shell scripts in this repository. We will not support
-Windows Shell/PowerShell. You can refer
-[to these instructions](https://docs.microsoft.com/en-us/windows/wsl/install)
-for WSL installation.
-
-Note that if you are using Windows Subsystem for Linux (WSL), make sure you 
-clone the repo with Linux style LF line endings and change the default setting
-for line separator in your Text Editor before running the format
-or lint scripts. This is automatically done if you clone using git inside WSL.
-If there is conflict due to the line endings you might see an error
-like - `: invalid option`.
diff --git a/examples/bert/README.md b/examples/bert/README.md
@@ -16,11 +16,6 @@ need to be trained for much longer on a much larger dataset.
 OUTPUT_DIR=~/bert_test_output
 DATA_URL=https://storage.googleapis.com/tensorflow/keras-nlp/examples/bert
 
-# Create a virtual env and install dependencies.
-mkdir $OUTPUT_DIR
-python3 -m venv $OUTPUT_DIR/env && source $OUTPUT_DIR/env/bin/activate
-pip install -e ".[tests,examples]"
-
 # Download example data.
 wget ${DATA_URL}/bert_vocab_uncased.txt -O $OUTPUT_DIR/bert_vocab_uncased.txt
 wget ${DATA_URL}/wiki_example_data.txt -O $OUTPUT_DIR/wiki_example_data.txt

diff --git a/keras_nlp/conftest.py b/keras_nlp/conftest.py
@@ -11,6 +11,8 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+import sys
+
 import pytest
 
 
@@ -29,6 +31,12 @@ def pytest_collection_modifyitems(config, items):
         # --runslow given in cli: do not skip slow tests
         return
     skip_slow = pytest.mark.skip(reason="need --runslow option to run")
+    skip_xla = pytest.mark.skipif(
+        sys.platform == "darwin", reason="XLA unsupported on MacOS."
+    )
+
     for item in items:
         if "slow" in item.keywords:
             item.add_marker(skip_slow)
+        if "jit_compile_true" in item.name:
+            item.add_marker(skip_xla)
diff --git a/keras_nlp/layers/mlm_mask_generator.py b/keras_nlp/layers/mlm_mask_generator.py
@@ -13,9 +13,15 @@
 # limitations under the License.
 
 import tensorflow as tf
-import tensorflow_text as tf_text
 from tensorflow import keras
 
+from keras_nlp.utils.tf_utils import assert_tf_text_installed
+
+try:
+    import tensorflow_text as tf_text
+except ImportError:
+    tf_text = None
+
 
 @keras.utils.register_keras_serializable(package="keras_nlp")
 class MLMMaskGenerator(keras.layers.Layer):
@@ -97,6 +103,8 @@ def __init__(
         random_token_rate=0.1,
         **kwargs,
     ):
+        assert_tf_text_installed(self.__class__.__name__)
+
         super().__init__(**kwargs)
         self.vocabulary_size = vocabulary_size
         self.unselectable_token_ids = unselectable_token_ids

diff --git a/keras_nlp/layers/multi_segment_packer.py b/keras_nlp/layers/multi_segment_packer.py
@@ -15,9 +15,15 @@
 """BERT token packing layer."""
 
 import tensorflow as tf
-import tensorflow_text as tf_text
 from tensorflow import keras
 
+from keras_nlp.utils.tf_utils import assert_tf_text_installed
+
+try:
+    import tensorflow_text as tf_text
+except ImportError:
+    tf_text = None
+
 
 @keras.utils.register_keras_serializable(package="keras_nlp")
 class MultiSegmentPacker(keras.layers.Layer):
@@ -107,6 +113,8 @@ def __init__(
         truncate="round_robin",
         **kwargs,
     ):
+        assert_tf_text_installed(self.__class__.__name__)
+
         super().__init__(**kwargs)
         self.sequence_length = sequence_length
         if truncate not in ("round_robin", "waterfall"):

diff --git a/keras_nlp/metrics/bleu.py b/keras_nlp/metrics/bleu.py
@@ -20,7 +20,7 @@
 import tensorflow as tf
 from tensorflow import keras
 
-from keras_nlp.utils.tensor_utils import tensor_to_list
+from keras_nlp.utils.tf_utils import tensor_to_list
 
 REPLACE_SUBSTRINGS = [
     ("<skipped>", ""),

diff --git a/keras_nlp/metrics/rouge_base.py b/keras_nlp/metrics/rouge_base.py
@@ -20,7 +20,7 @@
 import tensorflow as tf
 from tensorflow import keras
 
-from keras_nlp.utils.tensor_utils import tensor_to_string_list
+from keras_nlp.utils.tf_utils import tensor_to_string_list
 
 try:
     import rouge_score
@@ -65,8 +65,8 @@ def __init__(
 
         if rouge_score is None:
             raise ImportError(
-                "ROUGE metric requires the `rouge_score` package. "
-                "Please install it with `pip install rouge-score`."
+                f"{self.__class__.__name__} requires the `rouge_score` "
+                "package. Please install it with `pip install rouge-score`."
             )
 
         if not tf.as_dtype(self.dtype).is_floating:

diff --git a/keras_nlp/tests/integration_tests/basic_usage_test.py b/keras_nlp/tests/integration_tests/basic_usage_test.py
@@ -13,13 +13,17 @@
 # limitations under the License.
 
 import tensorflow as tf
+from absl.testing import parameterized
 from tensorflow import keras
 
 import keras_nlp
 
 
-class BasicUsageTest(tf.test.TestCase):
-    def test_quick_start(self):
+class BasicUsageTest(tf.test.TestCase, parameterized.TestCase):
+    @parameterized.named_parameters(
+        ("jit_compile_false", False), ("jit_compile_true", True)
+    )
+    def test_quick_start(self, jit_compile):
         """This matches the quick start example in our base README."""
 
         # Tokenize some inputs with a binary label.
@@ -47,7 +51,7 @@ def test_quick_start(self):
         model = keras.Model(inputs, outputs)
 
         # Run a single batch of gradient descent.
-        model.compile(loss="binary_crossentropy", jit_compile=True)
+        model.compile(loss="binary_crossentropy", jit_compile=jit_compile)
         loss = model.train_on_batch(x, y)
 
         # Make sure we have a valid loss.

diff --git a/keras_nlp/tokenizers/byte_tokenizer.py b/keras_nlp/tokenizers/byte_tokenizer.py
@@ -16,10 +16,15 @@
 
 import numpy as np
 import tensorflow as tf
-import tensorflow_text as tf_text
 from tensorflow import keras
 
 from keras_nlp.tokenizers import tokenizer
+from keras_nlp.utils.tf_utils import assert_tf_text_installed
+
+try:
+    import tensorflow_text as tf_text
+except ImportError:
+    tf_text = None
 
 
 @keras.utils.register_keras_serializable(package="keras_nlp")
@@ -157,6 +162,8 @@ def __init__(
         replacement_char: int = 65533,
         **kwargs,
     ):
+        assert_tf_text_installed(self.__class__.__name__)
+
         # Check dtype and provide a default.
         if "dtype" not in kwargs or kwargs["dtype"] is None:
             kwargs["dtype"] = tf.int32

diff --git a/keras_nlp/tokenizers/sentence_piece_tokenizer.py b/keras_nlp/tokenizers/sentence_piece_tokenizer.py
@@ -17,11 +17,16 @@
 from typing import List
 
 import tensorflow as tf
-import tensorflow_text as tf_text
 from tensorflow import keras
 
 from keras_nlp.tokenizers import tokenizer
-from keras_nlp.utils.tensor_utils import tensor_to_string_list
+from keras_nlp.utils.tf_utils import assert_tf_text_installed
+from keras_nlp.utils.tf_utils import tensor_to_string_list
+
+try:
+    import tensorflow_text as tf_text
+except ImportError:
+    tf_text = None
 
 
 @keras.utils.register_keras_serializable(package="keras_nlp")
@@ -98,6 +103,8 @@ def __init__(
         sequence_length: int = None,
         **kwargs,
     ) -> None:
+        assert_tf_text_installed(self.__class__.__name__)
+
         # Check dtype and provide a default.
         if "dtype" not in kwargs or kwargs["dtype"] is None:
             kwargs["dtype"] = tf.int32

diff --git a/keras_nlp/tokenizers/sentence_piece_tokenizer_trainer.py b/keras_nlp/tokenizers/sentence_piece_tokenizer_trainer.py
@@ -93,7 +93,9 @@ def compute_sentence_piece_proto(
 
     if spm is None:
         raise ImportError(
-            "sentencepiece is not installed. Please install it via `pip install sentencepiece`."
+            f"{compute_sentence_piece_proto.__name__} requires the "
+            "`sentencepiece` package. Please install it with "
+            "`pip install sentencepiece`."
         )
 
     if not isinstance(data, (list, tuple, tf.data.Dataset)):