Skip to content

Commit

Permalink
Improve MacOS support and pin tensorflow version during testing (#383)
Browse files Browse the repository at this point in the history
* Improve MacOS support

* Conditionally import tensorflow_text everywhere

* Use requirements files for continuous testing

* Fix logs

* Bug fixes and improvement for linux testing

* Typo fix

* Address review comments
  • Loading branch information
mattdangerw authored Oct 12, 2022
1 parent f43039d commit b197f85
Show file tree
Hide file tree
Showing 24 changed files with 228 additions and 75 deletions.
12 changes: 0 additions & 12 deletions .cloudbuild/requirements.txt

This file was deleted.

6 changes: 4 additions & 2 deletions .github/workflows/actions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ jobs:
${{ runner.os }}-pip-
- name: Install dependencies
run: |
pip install -e ".[tests]" --progress-bar off --upgrade
pip install -r requirements.txt --progress-bar off
pip install -e "." --progress-bar off
- name: Test with pytest
run: |
pytest --cov=keras_nlp --cov-report xml:coverage.xml
Expand All @@ -56,6 +57,7 @@ jobs:
${{ runner.os }}-pip-
- name: Install dependencies
run: |
pip install -e ".[tests]" --progress-bar off --upgrade
pip install -r requirements.txt --progress-bar off
pip install -e "." --progress-bar off
- name: Lint
run: bash shell/lint.sh
8 changes: 2 additions & 6 deletions .github/workflows/nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,8 @@ jobs:
${{ runner.os }}-pip-
- name: Install dependencies
run: |
pip install -e ".[tests]" --progress-bar off --upgrade
pip uninstall keras -y
pip uninstall tensorflow -y
pip uninstall tensorflow_text -y
pip install tf-nightly --progress-bar off --upgrade
pip install tensorflow-text-nightly --progress-bar off --upgrade
pip install -r requirements-nightly.txt --progress-bar off
pip install -e "." --progress-bar off
- name: Test with pytest
run: |
pytest --cov=keras_nlp --cov-report xml:coverage.xml
104 changes: 77 additions & 27 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,25 +84,90 @@ Once the pull request is approved, a team member will take care of merging.
Python 3.7 or later is required.

Setting up your KerasNLP development environment requires you to fork the
KerasNLP repository, clone the repository, create a virtual environment, and
install dependencies.

You can achieve this by running the following commands:
KerasNLP repository and clone it locally. With the
[GitHub CLI](https://github.com/cli/cli) installed, you can do this as follows:

```shell
gh repo fork keras-team/keras-nlp --clone --remote
cd keras-nlp
python -m venv ~/keras-nlp-venv
source ~/keras-nlp-venv/bin/activate
pip install -e ".[tests]"
```

The first line relies on having an installation of
[the GitHub CLI](https://github.com/cli/cli).
Next we must setup a python environment with the correct dependencies. We
recommend using `conda` to install tensorflow dependencies (such as CUDA), and
`pip` to install python packages from PyPI. The exact method will depend on your
OS.

### Linux (recommended)

To setup a complete environment with TensorFlow, a local install of keras-nlp,
and all development tools, run the following or adapt it to suit your needs.

```shell
# Create and activate conda environment.
conda create -n keras-nlp python=3.9
conda activate keras-nlp

# The following can be omitted if GPU support is not required.
conda install -c conda-forge cudatoolkit-dev=11.2 cudnn=8.1.0
mkdir -p $CONDA_PREFIX/etc/conda/activate.d/
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

# Install dependencies.
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -e "."
```

### MacOS

⚠️⚠️⚠️ MacOS binaries are for the M1 architecture are not currently available from
official sources. You can try experimental development workflow leveraging the
[tensorflow metal plugin](https://developer.apple.com/metal/tensorflow-plugin/)
and a [community maintained build](https://github.com/sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon)
of `tensorflow-text`. These binaries are not provided by Google, so proceed at
your own risk.

#### Experimental instructions for Arm (M1)

```shell
# Create and activate conda environment.
conda create -n keras-nlp python=3.9
conda activate keras-nlp

# Install dependencies.
conda install -c apple tensorflow-deps=2.9
python -m pip install --upgrade pip
python -m pip install -r requirements-macos-m1.txt
python -m pip install -e "."
```

Following these commands you should be able to run the tests using
`pytest keras_nlp`. Please report any issues running tests following these
steps.
#### Instructions for x86 (Intel)

```shell
# Create and activate conda environment.
conda create -n keras-nlp python=3.9
conda activate keras-nlp

# Install dependencies.
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -e "."
```

### Windows

For the best experience developing on windows, please install
[WSL](https://learn.microsoft.com/en-us/windows/wsl/install), and proceed with
the linux installation instruction above.

To run the format and lint scripts, make sure you clone the repo with Linux
style line endings and change any line separator settings in your editor.
This is automatically done if you clone using git inside WSL.

Note that will not support Windows Shell/PowerShell for any scripts in this
repository.

## Testing changes

Expand Down Expand Up @@ -150,18 +215,3 @@ the following commands manually every time you want to format your code:
If after running these the CI flow is still failing, try updating `flake8`,
`isort` and `black`. This can be done by running `pip install --upgrade black`,
`pip install --upgrade flake8`, and `pip install --upgrade isort`.

## Developing on Windows

For Windows development, we recommend using WSL (Windows Subsystem for Linux),
so you can run the shell scripts in this repository. We will not support
Windows Shell/PowerShell. You can refer
[to these instructions](https://docs.microsoft.com/en-us/windows/wsl/install)
for WSL installation.

Note that if you are using Windows Subsystem for Linux (WSL), make sure you
clone the repo with Linux style LF line endings and change the default setting
for line separator in your Text Editor before running the format
or lint scripts. This is automatically done if you clone using git inside WSL.
If there is conflict due to the line endings you might see an error
like - `: invalid option`.
5 changes: 0 additions & 5 deletions examples/bert/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,6 @@ need to be trained for much longer on a much larger dataset.
OUTPUT_DIR=~/bert_test_output
DATA_URL=https://storage.googleapis.com/tensorflow/keras-nlp/examples/bert

# Create a virtual env and install dependencies.
mkdir $OUTPUT_DIR
python3 -m venv $OUTPUT_DIR/env && source $OUTPUT_DIR/env/bin/activate
pip install -e ".[tests,examples]"

# Download example data.
wget ${DATA_URL}/bert_vocab_uncased.txt -O $OUTPUT_DIR/bert_vocab_uncased.txt
wget ${DATA_URL}/wiki_example_data.txt -O $OUTPUT_DIR/wiki_example_data.txt
Expand Down
8 changes: 8 additions & 0 deletions keras_nlp/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys

import pytest


Expand All @@ -29,6 +31,12 @@ def pytest_collection_modifyitems(config, items):
# --runslow given in cli: do not skip slow tests
return
skip_slow = pytest.mark.skip(reason="need --runslow option to run")
skip_xla = pytest.mark.skipif(
sys.platform == "darwin", reason="XLA unsupported on MacOS."
)

for item in items:
if "slow" in item.keywords:
item.add_marker(skip_slow)
if "jit_compile_true" in item.name:
item.add_marker(skip_xla)
10 changes: 9 additions & 1 deletion keras_nlp/layers/mlm_mask_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,15 @@
# limitations under the License.

import tensorflow as tf
import tensorflow_text as tf_text
from tensorflow import keras

from keras_nlp.utils.tf_utils import assert_tf_text_installed

try:
import tensorflow_text as tf_text
except ImportError:
tf_text = None


@keras.utils.register_keras_serializable(package="keras_nlp")
class MLMMaskGenerator(keras.layers.Layer):
Expand Down Expand Up @@ -97,6 +103,8 @@ def __init__(
random_token_rate=0.1,
**kwargs,
):
assert_tf_text_installed(self.__class__.__name__)

super().__init__(**kwargs)
self.vocabulary_size = vocabulary_size
self.unselectable_token_ids = unselectable_token_ids
Expand Down
10 changes: 9 additions & 1 deletion keras_nlp/layers/multi_segment_packer.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,15 @@
"""BERT token packing layer."""

import tensorflow as tf
import tensorflow_text as tf_text
from tensorflow import keras

from keras_nlp.utils.tf_utils import assert_tf_text_installed

try:
import tensorflow_text as tf_text
except ImportError:
tf_text = None


@keras.utils.register_keras_serializable(package="keras_nlp")
class MultiSegmentPacker(keras.layers.Layer):
Expand Down Expand Up @@ -107,6 +113,8 @@ def __init__(
truncate="round_robin",
**kwargs,
):
assert_tf_text_installed(self.__class__.__name__)

super().__init__(**kwargs)
self.sequence_length = sequence_length
if truncate not in ("round_robin", "waterfall"):
Expand Down
2 changes: 1 addition & 1 deletion keras_nlp/metrics/bleu.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
import tensorflow as tf
from tensorflow import keras

from keras_nlp.utils.tensor_utils import tensor_to_list
from keras_nlp.utils.tf_utils import tensor_to_list

REPLACE_SUBSTRINGS = [
("<skipped>", ""),
Expand Down
6 changes: 3 additions & 3 deletions keras_nlp/metrics/rouge_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
import tensorflow as tf
from tensorflow import keras

from keras_nlp.utils.tensor_utils import tensor_to_string_list
from keras_nlp.utils.tf_utils import tensor_to_string_list

try:
import rouge_score
Expand Down Expand Up @@ -65,8 +65,8 @@ def __init__(

if rouge_score is None:
raise ImportError(
"ROUGE metric requires the `rouge_score` package. "
"Please install it with `pip install rouge-score`."
f"{self.__class__.__name__} requires the `rouge_score` "
"package. Please install it with `pip install rouge-score`."
)

if not tf.as_dtype(self.dtype).is_floating:
Expand Down
10 changes: 7 additions & 3 deletions keras_nlp/tests/integration_tests/basic_usage_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,17 @@
# limitations under the License.

import tensorflow as tf
from absl.testing import parameterized
from tensorflow import keras

import keras_nlp


class BasicUsageTest(tf.test.TestCase):
def test_quick_start(self):
class BasicUsageTest(tf.test.TestCase, parameterized.TestCase):
@parameterized.named_parameters(
("jit_compile_false", False), ("jit_compile_true", True)
)
def test_quick_start(self, jit_compile):
"""This matches the quick start example in our base README."""

# Tokenize some inputs with a binary label.
Expand Down Expand Up @@ -47,7 +51,7 @@ def test_quick_start(self):
model = keras.Model(inputs, outputs)

# Run a single batch of gradient descent.
model.compile(loss="binary_crossentropy", jit_compile=True)
model.compile(loss="binary_crossentropy", jit_compile=jit_compile)
loss = model.train_on_batch(x, y)

# Make sure we have a valid loss.
Expand Down
9 changes: 8 additions & 1 deletion keras_nlp/tokenizers/byte_tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,15 @@

import numpy as np
import tensorflow as tf
import tensorflow_text as tf_text
from tensorflow import keras

from keras_nlp.tokenizers import tokenizer
from keras_nlp.utils.tf_utils import assert_tf_text_installed

try:
import tensorflow_text as tf_text
except ImportError:
tf_text = None


@keras.utils.register_keras_serializable(package="keras_nlp")
Expand Down Expand Up @@ -157,6 +162,8 @@ def __init__(
replacement_char: int = 65533,
**kwargs,
):
assert_tf_text_installed(self.__class__.__name__)

# Check dtype and provide a default.
if "dtype" not in kwargs or kwargs["dtype"] is None:
kwargs["dtype"] = tf.int32
Expand Down
11 changes: 9 additions & 2 deletions keras_nlp/tokenizers/sentence_piece_tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,16 @@
from typing import List

import tensorflow as tf
import tensorflow_text as tf_text
from tensorflow import keras

from keras_nlp.tokenizers import tokenizer
from keras_nlp.utils.tensor_utils import tensor_to_string_list
from keras_nlp.utils.tf_utils import assert_tf_text_installed
from keras_nlp.utils.tf_utils import tensor_to_string_list

try:
import tensorflow_text as tf_text
except ImportError:
tf_text = None


@keras.utils.register_keras_serializable(package="keras_nlp")
Expand Down Expand Up @@ -98,6 +103,8 @@ def __init__(
sequence_length: int = None,
**kwargs,
) -> None:
assert_tf_text_installed(self.__class__.__name__)

# Check dtype and provide a default.
if "dtype" not in kwargs or kwargs["dtype"] is None:
kwargs["dtype"] = tf.int32
Expand Down
4 changes: 3 additions & 1 deletion keras_nlp/tokenizers/sentence_piece_tokenizer_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,9 @@ def compute_sentence_piece_proto(

if spm is None:
raise ImportError(
"sentencepiece is not installed. Please install it via `pip install sentencepiece`."
f"{compute_sentence_piece_proto.__name__} requires the "
"`sentencepiece` package. Please install it with "
"`pip install sentencepiece`."
)

if not isinstance(data, (list, tuple, tf.data.Dataset)):
Expand Down
Loading

0 comments on commit b197f85

Please sign in to comment.