Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry picks 0.3 #454

Merged
merged 7 commits into from
Nov 10, 2022
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 5 additions & 8 deletions .github/workflows/actions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,7 @@ on:
jobs:
build:
name: Run tests
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [windows-latest, ubuntu-latest]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.7
Expand All @@ -32,8 +29,8 @@ jobs:
${{ runner.os }}-pip-
- name: Install dependencies
run: |
pip install tensorflow
pip install -e ".[tests]" --progress-bar off --upgrade
pip install -r requirements.txt --progress-bar off
pip install -e "." --progress-bar off
- name: Test with pytest
run: |
pytest --cov=keras_nlp --cov-report xml:coverage.xml
Expand All @@ -60,7 +57,7 @@ jobs:
${{ runner.os }}-pip-
- name: Install dependencies
run: |
pip install tensorflow
pip install -e ".[tests]" --progress-bar off --upgrade
pip install -r requirements.txt --progress-bar off
pip install -e "." --progress-bar off
- name: Lint
run: bash shell/lint.sh
13 changes: 3 additions & 10 deletions .github/workflows/nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,7 @@ on:
jobs:
build:
name: Run tests
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [windows-latest, ubuntu-latest]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.7
Expand All @@ -33,12 +30,8 @@ jobs:
${{ runner.os }}-pip-
- name: Install dependencies
run: |
pip install -e ".[tests]" --progress-bar off --upgrade
pip uninstall keras -y
pip uninstall tensorflow -y
pip uninstall tensorflow_text -y
pip install tf-nightly --progress-bar off --upgrade
pip install tensorflow-text-nightly --progress-bar off --upgrade
pip install -r requirements-nightly.txt --progress-bar off
pip install -e "." --progress-bar off
- name: Test with pytest
run: |
pytest --cov=keras_nlp --cov-report xml:coverage.xml
104 changes: 77 additions & 27 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,25 +84,90 @@ Once the pull request is approved, a team member will take care of merging.
Python 3.7 or later is required.

Setting up your KerasNLP development environment requires you to fork the
KerasNLP repository, clone the repository, create a virtual environment, and
install dependencies.

You can achieve this by running the following commands:
KerasNLP repository and clone it locally. With the
[GitHub CLI](https://github.com/cli/cli) installed, you can do this as follows:

```shell
gh repo fork keras-team/keras-nlp --clone --remote
cd keras-nlp
python -m venv ~/keras-nlp-venv
source ~/keras-nlp-venv/bin/activate
pip install -e ".[tests]"
```

The first line relies on having an installation of
[the GitHub CLI](https://github.com/cli/cli).
Next we must setup a python environment with the correct dependencies. We
recommend using `conda` to install tensorflow dependencies (such as CUDA), and
`pip` to install python packages from PyPI. The exact method will depend on your
OS.

### Linux (recommended)

To setup a complete environment with TensorFlow, a local install of keras-nlp,
and all development tools, run the following or adapt it to suit your needs.

```shell
# Create and activate conda environment.
conda create -n keras-nlp python=3.9
conda activate keras-nlp

# The following can be omitted if GPU support is not required.
conda install -c conda-forge cudatoolkit-dev=11.2 cudnn=8.1.0
mkdir -p $CONDA_PREFIX/etc/conda/activate.d/
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

# Install dependencies.
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -e "."
```

### MacOS

⚠️⚠️⚠️ MacOS binaries are for the M1 architecture are not currently available from
official sources. You can try experimental development workflow leveraging the
[tensorflow metal plugin](https://developer.apple.com/metal/tensorflow-plugin/)
and a [community maintained build](https://github.com/sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon)
of `tensorflow-text`. These binaries are not provided by Google, so proceed at
your own risk.

#### Experimental instructions for Arm (M1)

```shell
# Create and activate conda environment.
conda create -n keras-nlp python=3.9
conda activate keras-nlp

# Install dependencies.
conda install -c apple tensorflow-deps=2.9
python -m pip install --upgrade pip
python -m pip install -r requirements-macos-m1.txt
python -m pip install -e "."
```

Following these commands you should be able to run the tests using
`pytest keras_nlp`. Please report any issues running tests following these
steps.
#### Instructions for x86 (Intel)

```shell
# Create and activate conda environment.
conda create -n keras-nlp python=3.9
conda activate keras-nlp

# Install dependencies.
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -e "."
```

### Windows

For the best experience developing on windows, please install
[WSL](https://learn.microsoft.com/en-us/windows/wsl/install), and proceed with
the linux installation instruction above.

To run the format and lint scripts, make sure you clone the repo with Linux
style line endings and change any line separator settings in your editor.
This is automatically done if you clone using git inside WSL.

Note that will not support Windows Shell/PowerShell for any scripts in this
repository.

## Testing changes

Expand Down Expand Up @@ -143,18 +208,3 @@ the following commands manually every time you want to format your code:
If after running these the CI flow is still failing, try updating `flake8`,
`isort` and `black`. This can be done by running `pip install --upgrade black`,
`pip install --upgrade flake8`, and `pip install --upgrade isort`.

## Developing on Windows

For Windows development, we recommend using WSL (Windows Subsystem for Linux),
so you can run the shell scripts in this repository. We will not support
Windows Shell/PowerShell. You can refer
[to these instructions](https://docs.microsoft.com/en-us/windows/wsl/install)
for WSL installation.

Note that if you are using Windows Subsystem for Linux (WSL), make sure you
clone the repo with Linux style LF line endings and change the default setting
for line separator in your Text Editor before running the format
or lint scripts. This is automatically done if you clone using git inside WSL.
If there is conflict due to the line endings you might see an error
like - `: invalid option`.
5 changes: 0 additions & 5 deletions examples/bert/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,6 @@ need to be trained for much longer on a much larger dataset.
OUTPUT_DIR=~/bert_test_output
DATA_URL=https://storage.googleapis.com/tensorflow/keras-nlp/examples/bert

# Create a virtual env and install dependencies.
mkdir $OUTPUT_DIR
python3 -m venv $OUTPUT_DIR/env && source $OUTPUT_DIR/env/bin/activate
pip install -e ".[tests,examples]"

# Download example data.
wget ${DATA_URL}/bert_vocab_uncased.txt -O $OUTPUT_DIR/bert_vocab_uncased.txt
wget ${DATA_URL}/wiki_example_data.txt -O $OUTPUT_DIR/wiki_example_data.txt
Expand Down
10 changes: 7 additions & 3 deletions keras_nlp/integration_tests/basic_usage_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,17 @@
# limitations under the License.

import tensorflow as tf
from absl.testing import parameterized
from tensorflow import keras

import keras_nlp


class BasicUsageTest(tf.test.TestCase):
def test_quick_start(self):
class BasicUsageTest(tf.test.TestCase, parameterized.TestCase):
@parameterized.named_parameters(
("jit_compile_false", False), ("jit_compile_true", True)
)
def test_quick_start(self, jit_compile):
"""This matches the quick start example in our base README."""

# Tokenize some inputs with a binary label.
Expand Down Expand Up @@ -47,7 +51,7 @@ def test_quick_start(self):
model = keras.Model(inputs, outputs)

# Run a single batch of gradient descent.
model.compile(loss="binary_crossentropy", jit_compile=True)
model.compile(loss="binary_crossentropy", jit_compile=jit_compile)
loss = model.train_on_batch(x, y)

# Make sure we have a valid loss.
Expand Down
10 changes: 9 additions & 1 deletion keras_nlp/layers/mlm_mask_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,15 @@
# limitations under the License.

import tensorflow as tf
import tensorflow_text as tf_text
from tensorflow import keras

from keras_nlp.utils.tf_utils import assert_tf_text_installed

try:
import tensorflow_text as tf_text
except ImportError:
tf_text = None


class MLMMaskGenerator(keras.layers.Layer):
"""Layer that applies language model masking.
Expand Down Expand Up @@ -96,6 +102,8 @@ def __init__(
random_token_rate=0.1,
**kwargs,
):
assert_tf_text_installed(self.__class__.__name__)

super().__init__(**kwargs)
self.vocabulary_size = vocabulary_size
self.unselectable_token_ids = unselectable_token_ids
Expand Down
10 changes: 9 additions & 1 deletion keras_nlp/layers/multi_segment_packer.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,15 @@
"""BERT token packing layer."""

import tensorflow as tf
import tensorflow_text as tf_text
from tensorflow import keras

from keras_nlp.utils.tf_utils import assert_tf_text_installed

try:
import tensorflow_text as tf_text
except ImportError:
tf_text = None


class MultiSegmentPacker(keras.layers.Layer):
"""Packs multiple sequences into a single fixed width model input.
Expand Down Expand Up @@ -106,6 +112,8 @@ def __init__(
truncator="round_robin",
**kwargs,
):
assert_tf_text_installed(self.__class__.__name__)

super().__init__(**kwargs)
self.sequence_length = sequence_length
if truncator not in ("round_robin", "waterfall"):
Expand Down
6 changes: 3 additions & 3 deletions keras_nlp/metrics/rouge_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
import tensorflow as tf
from tensorflow import keras

from keras_nlp.utils.tensor_utils import tensor_to_string_list
from keras_nlp.utils.tf_utils import tensor_to_string_list

try:
import rouge_score
Expand Down Expand Up @@ -62,8 +62,8 @@ def __init__(

if rouge_score is None:
raise ImportError(
"ROUGE metric requires the `rouge_score` package. "
"Please install it with `pip install rouge-score`."
f"{self.__class__.__name__} requires the `rouge_score` "
"package. Please install it with `pip install rouge-score`."
)

if not tf.as_dtype(self.dtype).is_floating:
Expand Down
1 change: 1 addition & 0 deletions keras_nlp/tokenizers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from keras_nlp.tokenizers.byte_pair_tokenizer import BytePairTokenizer
from keras_nlp.tokenizers.byte_tokenizer import ByteTokenizer
from keras_nlp.tokenizers.sentence_piece_tokenizer import SentencePieceTokenizer
from keras_nlp.tokenizers.tokenizer import Tokenizer
Expand Down
Loading