From a3619aa7515a79086cdf0bd1c3db19ec23eac569 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Mon, 2 May 2022 16:56:46 -0700
Subject: [PATCH 01/15] Add new docs cross-linking keras.io for 0.2 release

---
 API_DESIGN.md                |  62 +++++++++++++++
 CODE_OF_CONDUCT.md           |   4 +
 keras_nlp/LICENSE => LICENSE |   0
 README.md                    | 144 ++++++++++++++++++++++-------------
 ROADMAP.md                   |  72 ++++++++++++++++++
 5 files changed, 230 insertions(+), 52 deletions(-)
 create mode 100644 API_DESIGN.md
 create mode 100644 CODE_OF_CONDUCT.md
 rename keras_nlp/LICENSE => LICENSE (100%)
 create mode 100644 ROADMAP.md

diff --git a/API_DESIGN.md b/API_DESIGN.md
new file mode 100644
index 0000000000..674414df0d
--- /dev/null
+++ b/API_DESIGN.md
@@ -0,0 +1,62 @@
+# KerasNLP Design Guidelines
+
+KerasNLP uses the same API design guidelines as the rest of the Keras
+ecosystem, documented [here]
+https://github.com/keras-team/governance/blob/master/keras_api_design_guidelines.md).
+Anyone hoping to contribute to KerasNLP API design is strongly encouraged to
+read through that document in it's entirety.
+
+Below are some design considerations specific to KerasNLP.
+
+## Dependencies
+
+The core dependencies of KerasNLP are Numpy, TensorFlow, Keras, and
+[Tensorflow Text](https://www.tensorflow.org/text).
+
+We strive to keep KerasNLP as self-contained as possible, and will not add
+dependencies to projects like NLTK or spaCy for text preprocessing.
+
+In rare cases, particularly with tokenizers and metrics, we may need to add
+an external dependency for compatibility with the "canonical" implementation
+of a certain technique. In these cases, avoid adding a new package dependency,
+and add installation instructions for the specific symbol:
+
+```python
+try:
+    import rouge_score
+except ImportError:
+    pass
+
+class RougeL(keras.metrics.Metric):
+    def __init__(self):
+        if rouge_score is None:
+            raise ImportError(
+                'RougeL metrics requires the rouge_score package. '
+                '`pip install rouge-score`.')
+```
+
+## TensorFlow graph support
+
+Our layers, metrics, and tokenizers should be fast and efficient, which means
+running inside the
+[TensorFlow graph](https://www.tensorflow.org/guide/intro_to_graphs)
+whenever possible.
+
+[tf.strings](https://www.tensorflow.org/api_docs/python/tf/strings) and
+[tf.text](https://www.tensorflow.org/text/api_docs/python/text) provides a large
+surface on TensorFlow operations that manipulate strings.
+
+If an low-level (c++) operation we need is missing, we should add it in
+collaboration with core TensorFlow or TensorFlow Text. KerasNLP is a python-only
+library.
+
+## Multi-lingual support
+
+We strive to keep KerasNLP a friendly and useful library for speakers of all
+languages. In general, prefer designing workflows that are language agnostic,
+and do not involve details (e.g. stemming) that need to be rewritten
+per-language.
+
+It is OK for new workflows to not come with of the box support for all
+languages in a first release, but a design that does not include a plan for
+multi-lingual support will be rejected.
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
new file mode 100644
index 0000000000..1ac9aff3c4
--- /dev/null
+++ b/CODE_OF_CONDUCT.md
@@ -0,0 +1,4 @@
+# Code of Conduct
+
+This project follows
+[Google's Open Source Community Guidelines](https://opensource.google/conduct/).
diff --git a/keras_nlp/LICENSE b/LICENSE
similarity index 100%
rename from keras_nlp/LICENSE
rename to LICENSE
diff --git a/README.md b/README.md
index 6d21abbfdf..2831acd743 100644
--- a/README.md
+++ b/README.md
@@ -4,68 +4,100 @@
 ![Tensorflow](https://img.shields.io/badge/tensorflow-v2.5.0+-success.svg)
 [![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/keras-team/keras-nlp/issues)
 
-KerasNLP is a repository of modular building blocks (e.g. layers, metrics, losses)
-to support modern Natural Language Processing (NLP) workflows.
-Engineers working with applied NLP can leverage it to
-rapidly assemble training and inference pipelines that are both state-of-the-art
-and production-grade. Common use cases for application include sentiment
-analysis, named entity recognition, text generation, etc.
+KerasNLP is a simple and powerful API for building Natural Language
+Processing (NLP) models. KerasNLP provides modular building blocks following
+standard Keras interfaces (layers, metrics) that allow you to quickly and
+flexibly iterate on your task. Engineers working in applied NLP can leverage the
+library to assemble training and inference pipelines that are both
+state-of-the-art and production-grade.
 
-KerasNLP can be understood as a horizontal extension of the Keras API: they're
-new first-party Keras objects (layers, metrics, etc) that are too specialized to
-be added to core Keras, but that receive the same level of polish and backwards
-compatibility guarantees as the rest of the Keras API and that are maintained by
-the Keras team itself (unlike TFAddons).
+KerasNLP can be understood as a horizontal extension of the Keras API:
+components are first-party Keras objects that are too specialized to be
+added to core Keras, but that receive the same level of polish as the rest of
+the Keras API.
 
-Currently, KerasNLP is operating pre-release. Upon launch of KerasNLP 1.0, full
-API docs and code examples will be available.
+KerasNLP is a new and growing project, and open to
+[contributions](#contributing).
 
-## Contributors
+## Quick Links
 
-If you'd like to contribute, please see our [contributing guide](CONTRIBUTING.md).
+- [Documentation](keras.io/keras_nlp)
+- [Contributing Guide](CONTRIBUTING.md)
+- [Call for Contributions](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
+- [Roadmap](ROADMAP.md)
+- [API Design Guidelines](API_DESIGN.md)
 
-The fastest way to find a place to contribute is to browse our
-[open issues](https://github.com/keras-team/keras-nlp/issues) and find an
-unclaimed issue to work on. Issues with a [contributions welcome](
-https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
-tag are places where we are actively looking for support, and a
-[good first issue](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
-tag means we think this could be a accessible a first time contributor.
+## Quick Start
 
-If you would like to propose a new symbol or feature, please open an issue to
-discuss. Be aware the design for new features may take longer than contributing
-pre-planned features. If you have a design in mind, please include a colab
-notebook showing the proposed design in a end-to-end example. Make sure to
-follow the [Keras API design guidelines](
-https://github.com/keras-team/governance/blob/master/keras_api_design_guidelines.md).
+Install the latest release:
 
-## Roadmap
+```
+pip install keras-nlp --upgrade
+```
 
-This is an early stage project, and we are actively working on a more detailed
-roadmap to share soon. For now, most of our immediate planning is done through
-GitHub issues.
+Tokenize text, build a transformer, and train a single batch:
+
+```python
+import keras_nlp
+import tensorflow as tf
+from tensorflow import keras
+
+# Tokenize some inputs with a binary label.
+vocab = ["[UNK]", "the", "qu", "##ick", "br", "##own", "fox", "jumped", "."]
+inputs = ["The quick brown fox jumped.", "The fox slept."]
+tokenizer = keras_nlp.tokenizers.WordPieceTokenizer(
+    vocabulary=vocab, sequence_length=10)
+X, Y = tokenizer(inputs), tf.constant([1, 0])
+
+# Create a tiny transformer.
+inputs = keras.Input(shape=(None,), dtype="int32")
+x = keras_nlp.layers.TokenAndPositionEmbedding(
+    vocabulary_size=len(vocab),
+    sequence_length=10,
+    embedding_dim=16,
+)(inputs)
+x = keras_nlp.layers.TransformerEncoder(
+    num_heads=4,
+    intermediate_dim=32,
+)(x)
+x = keras.layers.GlobalAveragePooling1D()(x)
+outputs = keras.layers.Dense(1, activation="sigmoid")(x)
+model = keras.Model(inputs, outputs)
+
+# Run a single batch of gradient descent.
+model.compile(loss="binary_crossentropy")
+model.train_on_batch(X, Y)
+```
 
-At this stage, we are primarily building components for a short list of
-"greatest hits" NLP models (e.g. BERT, GPT-2, word2vec). We will be focusing
-on components that follow a established Keras interface (e.g.
-`keras.layers.Layer`, `keras.metrics.Metric`, or
-`keras_nlp.tokenizers.Tokenizer`).
+For a complete model building tutorial, see our guide on
+[pretraining a transformer](keras.io/guides/keras_nlp/transformer_pretraining).
 
-As we progress further with the library, we will attempt to cover an ever
-expanding list of widely cited model architectures.
+## Contributing
 
-## Releases
+If you'd like to contribute, please see our [contributing guide](CONTRIBUTING.md).
 
-KerasNLP release are documented on our
-[github release page](https://github.com/keras-team/keras-nlp/releases) and
-available to download from our [PyPI project](
-https://pypi.org/project/keras-nlp/).
+The fastest way to contribute it to find an
+[open issues](https://github.com/keras-team/keras-nlp/issues) that needs
+an assignee. We maintain a
+[good first issue](
+https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
+tag for newcomers to the project, and a longer list of
+[contributions welcome](
+https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
+issues.
 
-To install KerasNLP and all it's dependencies, simply run:
+If you would like propose a new symbol or feature, please first read our
+[Roadmap](ROADMAP.md) and [API Design Guidelines](API_DESIGN.md), and then open
+an issue to discuss. If you have a design in mind, please include a colab
+notebook showing the proposed design in a end-to-end example. Keep in
+mind that design for a new feature or use case may take much longer than
+contributing to an open issue with a vetted-design.
 
-```
-pip install keras-nlp
-```
+Thank you to all of our wonderful contributors!
+
+<a href="https://github.com/keras-team/keras-nlp/graphs/contributors">
+  <img src="https://contrib.rocks/image?repo=keras-team/keras-nlp" />
+</a>
 
 ## Compatibility
 
@@ -74,8 +106,16 @@ provide backwards compatibility guarantees both for code and saved models built
 with our components. While we continue with pre-release `0.y.z` development, we
 may break compatibility at any time and APIs should not be consider stable.
 
-Thank you to all of our wonderful contributors!
+## Citing KerasNLP
 
-<a href="https://github.com/keras-team/keras-nlp/graphs/contributors">
-  <img src="https://contrib.rocks/image?repo=keras-team/keras-nlp" />
-</a>
+If KerasNLP helps your research, we appreciate your citations.
+Here is the BibTeX entry:
+
+```bibtex
+@misc{kerasnlp2022,
+  title={KerasNLP},
+  author={Watson, Matthew, and Qian, Chen, and Zhu, Scott and Chollet, Fran\c{c}ois and others},
+  year={2022},
+  howpublished={\url{https://github.com/keras-team/keras-nlp}},
+}
+```
diff --git a/ROADMAP.md b/ROADMAP.md
new file mode 100644
index 0000000000..ff21613cc1
--- /dev/null
+++ b/ROADMAP.md
@@ -0,0 +1,72 @@
+# Roadmap
+
+## What KerasNLP is
+
+KerasNLP is focused on a few core offerings:
+
+- A high-quality library of modular building blocks for modern NLP workflows.
+- A collection of guides and examples on [keras.io](keras.io/keras_nlp) showing
+  how to use these components to solve end-to-end NLP tasks.
+- A collection of examples in this repository, showing how to use these
+  components at scale to train state-of-art models from scratch. This is not
+  part of the library itself, but rather a way to vet our components and model
+  best practices.
+
+Contributions on any of these fronts are welcome!
+
+## What KerasNLP is not
+
+- **KerasNLP is not a research library.** Researchers may use it, but we do not
+  consider researchers to be our target audience. Our target audience is
+  applied NLP engineers with experimentation and production needs. KerasNLP
+  should make it possible to quickly reimplement industry-strength versions of
+  the latest generation of architectures produced by researchers, but we don't
+  expect the research
+  effort itself to be built on top of KerasNLP. This enables us to focus on
+  usability and API standardization, and produce objects that have a longer
+  lifespan than the average research project.
+
+- **KerasNLP is not a repository of blackbox end-to-end solutions.**
+
+    KerasNLP is focused on modular and reusable building blocks. In the process
+    of developing these building blocks, we will by necessity implement
+    end-to-end workflows, but they're intended purely for demonstration and
+    grounding purposes, they're not our main deliverable.
+
+- **KerasNLP is not a repository of low-level string ops, like tf.text.**
+
+    KerasNLP is fundamentally an extension of the Keras API: it hosts Keras
+    objects, like layers, metrics, or callbacks. Low-level C++ ops should go
+    directly to [Tensorflow Text](https://www.tensorflow.org/text) or
+    core Tensorflow.
+
+## Philosophy
+
+- **Let user needs be our compass.** Any modular building block that NLP
+  practitioners need is in scope, whether it's data loading, augmentation, model
+  building, evaluation metrics, visualization utils...
+- **Be resolutely high-level.** Even if something is easy to do by hand in 5
+  lines, package it as a one liner.
+- **Balance ease of use and flexibility** – simple things should be easy, and
+  arbitrarily advanced use cases should be possible. There should always be a
+  "we need to go deeper" path available to our most expert users.
+- **Grow as a platform and as a community** – KerasNLP development should 
+
+## Areas of interest
+
+At this point in our development cycle, we are primarily interested in providing
+building blocks for a short list of "greatest hits" NLP models (e.g. BERT,
+GPT-2, word2vec).
+
+We are focusing on components that follow an established Keras interface
+(e.g. keras.layers.Layer, keras.metrics.Metric, or
+keras_nlp.tokenizers.Tokenizer).
+
+Note that while we will be supporting large-scale, pre-trained Transformer as a
+key offering from our library, but we are not a strictly Transformer-based
+modeling library. We aim to support simple techniques such as n-gram models and
+word2vec embeddings, and make it easy to hop between different approaches.
+
+## Pre-trained modeling workflows
+
+

From 1d34ffdb2b51855534a024eb25228efc5a27adb5 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Mon, 2 May 2022 18:08:03 -0700
Subject: [PATCH 02/15] More updates

---
 API_DESIGN.md |  30 ++++++++-----
 README.md     |  24 +++++------
 ROADMAP.md    | 116 +++++++++++++++++++++++++++++++++++---------------
 3 files changed, 114 insertions(+), 56 deletions(-)

diff --git a/API_DESIGN.md b/API_DESIGN.md
index 674414df0d..b72e97003e 100644
--- a/API_DESIGN.md
+++ b/API_DESIGN.md
@@ -2,19 +2,19 @@
 
 KerasNLP uses the same API design guidelines as the rest of the Keras
 ecosystem, documented [here]
-https://github.com/keras-team/governance/blob/master/keras_api_design_guidelines.md).
-Anyone hoping to contribute to KerasNLP API design is strongly encouraged to
-read through that document in it's entirety.
+(https://github.com/keras-team/governance/blob/master/keras_api_design_guidelines.md).
+Anyone contributing to KerasNLP API design is strongly encouraged to
+read through the document in it's entirety.
 
 Below are some design considerations specific to KerasNLP.
 
-## Dependencies
+## Avoid new dependencies
 
-The core dependencies of KerasNLP are Numpy, TensorFlow, Keras, and
+The core dependencies of KerasNLP are Keras, NumPy, TensorFlow, and
 [Tensorflow Text](https://www.tensorflow.org/text).
 
-We strive to keep KerasNLP as self-contained as possible, and will not add
-dependencies to projects like NLTK or spaCy for text preprocessing.
+We strive to keep KerasNLP as self-contained as possible, and avoid adding
+dependencies to projects (for example NLTK or spaCy) for text preprocessing.
 
 In rare cases, particularly with tokenizers and metrics, we may need to add
 an external dependency for compatibility with the "canonical" implementation
@@ -35,7 +35,7 @@ class RougeL(keras.metrics.Metric):
                 '`pip install rouge-score`.')
 ```
 
-## TensorFlow graph support
+## Keep computation inside TensorFlow graph
 
 Our layers, metrics, and tokenizers should be fast and efficient, which means
 running inside the
@@ -50,11 +50,21 @@ If an low-level (c++) operation we need is missing, we should add it in
 collaboration with core TensorFlow or TensorFlow Text. KerasNLP is a python-only
 library.
 
-## Multi-lingual support
+## Use tf.data for text preprocessing and augmentation
+
+In general, our preprocessing tools should be runnable inside a
+[tf.data](https://www.tensorflow.org/guide/data) pipeline, and any augmentation
+to training data should be dynamic--runnable on the fly during training rather
+than precomputed.
+
+We should design our preprocessing workflows with tf.data in mind, and support
+both batched and unbatched data as input to preprocessing layers.
+
+## Prioritize multi-lingual support
 
 We strive to keep KerasNLP a friendly and useful library for speakers of all
 languages. In general, prefer designing workflows that are language agnostic,
-and do not involve details (e.g. stemming) that need to be rewritten
+and do not involve logic (e.g. stemming) that need to be rewritten
 per-language.
 
 It is OK for new workflows to not come with of the box support for all
diff --git a/README.md b/README.md
index 2831acd743..fb97f337ec 100644
--- a/README.md
+++ b/README.md
@@ -16,16 +16,16 @@ components are first-party Keras objects that are too specialized to be
 added to core Keras, but that receive the same level of polish as the rest of
 the Keras API.
 
-KerasNLP is a new and growing project, and open to
+KerasNLP is a new and growing project, and we welcome
 [contributions](#contributing).
 
 ## Quick Links
 
-- [Documentation](keras.io/keras_nlp)
-- [Contributing Guide](CONTRIBUTING.md)
-- [Call for Contributions](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
+- [Documentation](https://keras.io/keras_nlp)
+- [Contributing guide](CONTRIBUTING.md)
 - [Roadmap](ROADMAP.md)
 - [API Design Guidelines](API_DESIGN.md)
+- [Help wanted issues](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
 
 ## Quick Start
 
@@ -76,22 +76,22 @@ For a complete model building tutorial, see our guide on
 
 If you'd like to contribute, please see our [contributing guide](CONTRIBUTING.md).
 
-The fastest way to contribute it to find an
-[open issues](https://github.com/keras-team/keras-nlp/issues) that needs
+The fastest way to contribute it to find
+[open issues](https://github.com/keras-team/keras-nlp/issues) that need
 an assignee. We maintain a
 [good first issue](
 https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
 tag for newcomers to the project, and a longer list of
 [contributions welcome](
 https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
-issues.
+issues that may range in complexity.
 
 If you would like propose a new symbol or feature, please first read our
-[Roadmap](ROADMAP.md) and [API Design Guidelines](API_DESIGN.md), and then open
-an issue to discuss. If you have a design in mind, please include a colab
-notebook showing the proposed design in a end-to-end example. Keep in
-mind that design for a new feature or use case may take much longer than
-contributing to an open issue with a vetted-design.
+[Roadmap](ROADMAP.md) and [API Design Guidelines](API_DESIGN.md), then open
+an issue to discuss. If you have a specific design in mind, please include a
+[Colab](https://colab.research.google.com/) notebook showing the proposed design
+in a end-to-end example. Keep in mind that design for a new feature or use case
+may take longer than contributing to an open issue with a vetted-design.
 
 Thank you to all of our wonderful contributors!
 
diff --git a/ROADMAP.md b/ROADMAP.md
index ff21613cc1..39e18cf1c5 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -2,71 +2,119 @@
 
 ## What KerasNLP is
 
-KerasNLP is focused on a few core offerings:
-
-- A high-quality library of modular building blocks for modern NLP workflows.
-- A collection of guides and examples on [keras.io](keras.io/keras_nlp) showing
-  how to use these components to solve end-to-end NLP tasks.
-- A collection of examples in this repository, showing how to use these
-  components at scale to train state-of-art models from scratch. This is not
-  part of the library itself, but rather a way to vet our components and model
+- **A high-quality library of modular building blocks.** KerasNLP components
+  follow an established Keras interface (e.g. keras.layers.Layer,
+  keras.metrics.Metric, or keras_nlp.tokenizers.Tokenizer), and make it easy to
+  assemble state-of-the-art NLP workflows.
+
+- **A collection of guides and examples.** This effort is split between two
+  locations:
+  
+  On [keras.io](keras.io/keras_nlp), we host a collection of small-scale,
+  easily accessible guides showing end-to-end workflows using KerasNLP.
+
+  In this repository, we host a collection of
+  [examples](https://github.com/keras-team/keras-nlp/tree/master/examples) on
+  how to train large-scale, state-of-the-art models from scratch. This is not
+  part of the library itself, but rather a way to vet our components and show
   best practices.
 
-Contributions on any of these fronts are welcome!
+- **A community of NLP practioners.** KerasNLP is a actively growing project,
+  and we welcome contributors on all fronts of our development. We hope that our
+  guides and examples can be both a valuable resource to experienced
+  practitioners, and a accessible entry point to newcomers to the field.
 
 ## What KerasNLP is not
 
 - **KerasNLP is not a research library.** Researchers may use it, but we do not
   consider researchers to be our target audience. Our target audience is
   applied NLP engineers with experimentation and production needs. KerasNLP
-  should make it possible to quickly reimplement industry-strength versions of
+  should make it possible to quickly re-implement industry-strength versions of
   the latest generation of architectures produced by researchers, but we don't
-  expect the research
-  effort itself to be built on top of KerasNLP. This enables us to focus on
-  usability and API standardization, and produce objects that have a longer
-  lifespan than the average research project.
+  expect the research effort itself to be built on top of KerasNLP. This enables
+  us to focus on usability and API standardization, and produce objects that
+  have a longer lifespan than the average research project.
 
 - **KerasNLP is not a repository of blackbox end-to-end solutions.**
-
-    KerasNLP is focused on modular and reusable building blocks. In the process
-    of developing these building blocks, we will by necessity implement
-    end-to-end workflows, but they're intended purely for demonstration and
-    grounding purposes, they're not our main deliverable.
+  KerasNLP is focused on modular and reusable building blocks. In the process
+  of developing these building blocks, we will by necessity implement
+  end-to-end workflows, but they're intended purely for demonstration and
+  grounding purposes, they're not our main deliverable.
 
 - **KerasNLP is not a repository of low-level string ops, like tf.text.**
-
-    KerasNLP is fundamentally an extension of the Keras API: it hosts Keras
-    objects, like layers, metrics, or callbacks. Low-level C++ ops should go
-    directly to [Tensorflow Text](https://www.tensorflow.org/text) or
-    core Tensorflow.
+  KerasNLP is fundamentally an extension of the Keras API: it hosts Keras
+  objects, like layers, metrics, or callbacks. Low-level C++ ops should go
+  directly to [Tensorflow Text](https://www.tensorflow.org/text) or
+  core Tensorflow.
 
 ## Philosophy
 
 - **Let user needs be our compass.** Any modular building block that NLP
   practitioners need is in scope, whether it's data loading, augmentation, model
-  building, evaluation metrics, visualization utils...
+  building, evaluation metrics, or visualization utils.
+
 - **Be resolutely high-level.** Even if something is easy to do by hand in 5
   lines, package it as a one liner.
-- **Balance ease of use and flexibility** – simple things should be easy, and
+
+- **Balance ease of use and flexibility.** Simple things should be easy, and
   arbitrarily advanced use cases should be possible. There should always be a
   "we need to go deeper" path available to our most expert users.
-- **Grow as a platform and as a community** – KerasNLP development should 
+
+- **Grow as a platform and as a community.** KerasNLP development should be
+  driven by the community, with feature and release planning happening out in
+  the open on GitHub.
 
 ## Areas of interest
 
 At this point in our development cycle, we are primarily interested in providing
-building blocks for a short list of "greatest hits" NLP models (e.g. BERT,
-GPT-2, word2vec).
-
-We are focusing on components that follow an established Keras interface
-(e.g. keras.layers.Layer, keras.metrics.Metric, or
-keras_nlp.tokenizers.Tokenizer).
+building blocks for a short list of "greatest hits" NLP models (such as BERT,
+GPT-2, word2vec). Given a popular model architecture (e.g. a
+sequence-to-sequence transformer like T5) and a end-to-end task (e.g.
+summarization), we should have a clear code example in mind and a list of
+components to use.
 
-Note that while we will be supporting large-scale, pre-trained Transformer as a
+Note that while we will be supporting large-scale Transformer models as a
 key offering from our library, but we are not a strictly Transformer-based
 modeling library. We aim to support simple techniques such as n-gram models and
 word2vec embeddings, and make it easy to hop between different approaches.
 
+Current focus areas:
+
+- In-graph tokenization leveraging
+  [Tensorflow Text](https://www.tensorflow.org/text). We aim to have a fully
+  featured offering of character, word, and sub-word tokenizers that run
+  within the Tensorflow graph.
+- Scalable and easily trainable modeling
+  [examples](https://github.com/keras-team/keras-nlp/tree/master/examples)
+  runnable on Google Cloud. We will continue to port our BERT example to run
+  entirely on keras_nlp components for both training and preprocessing, and
+  give easy recipes for running multi-worker training. Once this is done, we
+  would like to extend this effort to other popular architectures.
+- Text generation workflows. We would like to support text generation from
+  trained models using greedy or beam search in a clear and easy to use
+  workflow.
+- Data augmentation preprocessing layers for domains with limited data. These
+  layers will allow easily defining `tf.data` pipelines that augment input
+  example sentences on the fly.
+- Metrics for model evaluation, such a ROUGE and BLEU for evaluating translation
+  quality.
+
+## Citations
+
+At this moment in time, we have no set citation bar for development, but due to
+the newness of the library we want to focus our efforts on a small subset of the
+best known and most effective NLP techniques.
+
+Proposed components should usually either be part of a very well architecture
+(think 1000s of citations) or contribute in some meaningful way to the usability
+of an end-to-end workflow.
+
 ## Pre-trained modeling workflows
 
+Pre-training many modern NLP models is prohibitively expensive and
+time-consuming for an average user. A key goal with the KerasNLP project is to
+support easy use of pre-trained models using KerasNLP components.
 
+We are working with the rest of the Tensorflow ecosystem (e.g. TF Hub), to
+provide a coherent plan for accessing pre-trained models. We will continue to
+share updates as they are available.

From 7590b28b45bea062dee5f1dfebdc240a81d484c4 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Mon, 2 May 2022 18:15:54 -0700
Subject: [PATCH 03/15] Formatting

---
 API_DESIGN.md | 30 +++++++++++++++++++++--------
 README.md     | 40 ++++++++++++++++++++------------------
 ROADMAP.md    | 53 +++++++++++++++++----------------------------------
 3 files changed, 60 insertions(+), 63 deletions(-)

diff --git a/API_DESIGN.md b/API_DESIGN.md
index b72e97003e..36ec290cd4 100644
--- a/API_DESIGN.md
+++ b/API_DESIGN.md
@@ -1,13 +1,27 @@
 # KerasNLP Design Guidelines
 
-KerasNLP uses the same API design guidelines as the rest of the Keras
-ecosystem, documented [here]
-(https://github.com/keras-team/governance/blob/master/keras_api_design_guidelines.md).
-Anyone contributing to KerasNLP API design is strongly encouraged to
-read through the document in it's entirety.
+Before reading this document, please read the
+[Keras API design guidelines](https://github.com/keras-team/governance/blob/master/keras_api_design_guidelines.md).
 
 Below are some design considerations specific to KerasNLP.
 
+## Philosophy
+
+- **Let user needs be our compass.** Any modular building block that NLP
+  practitioners need is in scope, whether it's data loading, augmentation, model
+  building, evaluation metrics, or visualization utils.
+
+- **Be resolutely high-level.** Even if something is easy to do by hand in 5
+  lines, package it as a one liner.
+
+- **Balance ease of use and flexibility.** Simple things should be easy, and
+  arbitrarily advanced use cases should be possible. There should always be a
+  "we need to go deeper" path available to our most expert users.
+
+- **Grow as a platform and as a community.** KerasNLP development should be
+  driven by the community, with feature and release planning happening in
+  the open on GitHub.
+
 ## Avoid new dependencies
 
 The core dependencies of KerasNLP are Keras, NumPy, TensorFlow, and
@@ -50,12 +64,12 @@ If an low-level (c++) operation we need is missing, we should add it in
 collaboration with core TensorFlow or TensorFlow Text. KerasNLP is a python-only
 library.
 
-## Use tf.data for text preprocessing and augmentation
+## Support tf.data for text preprocessing and augmentation
 
 In general, our preprocessing tools should be runnable inside a
 [tf.data](https://www.tensorflow.org/guide/data) pipeline, and any augmentation
-to training data should be dynamic--runnable on the fly during training rather
-than precomputed.
+to training data should be dynamic (runnable on the fly during training rather
+than precomputed).
 
 We should design our preprocessing workflows with tf.data in mind, and support
 both batched and unbatched data as input to preprocessing layers.
diff --git a/README.md b/README.md
index fb97f337ec..0f1b7a4f38 100644
--- a/README.md
+++ b/README.md
@@ -4,20 +4,21 @@
 ![Tensorflow](https://img.shields.io/badge/tensorflow-v2.5.0+-success.svg)
 [![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/keras-team/keras-nlp/issues)
 
-KerasNLP is a simple and powerful API for building Natural Language
-Processing (NLP) models. KerasNLP provides modular building blocks following
+KerasNLP is a simple and powerful API for building Natural Language Processing
+(NLP) models within the Keras ecosystem.
+
+KerasNLP provides modular building blocks following
 standard Keras interfaces (layers, metrics) that allow you to quickly and
 flexibly iterate on your task. Engineers working in applied NLP can leverage the
 library to assemble training and inference pipelines that are both
 state-of-the-art and production-grade.
 
-KerasNLP can be understood as a horizontal extension of the Keras API:
+KerasNLP can be understood as a horizontal extension of the Keras API —
 components are first-party Keras objects that are too specialized to be
 added to core Keras, but that receive the same level of polish as the rest of
 the Keras API.
 
-KerasNLP is a new and growing project, and we welcome
-[contributions](#contributing).
+We are a new and growing project, and welcome [contributions](#contributing).
 
 ## Quick Links
 
@@ -25,7 +26,7 @@ KerasNLP is a new and growing project, and we welcome
 - [Contributing guide](CONTRIBUTING.md)
 - [Roadmap](ROADMAP.md)
 - [API Design Guidelines](API_DESIGN.md)
-- [Help wanted issues](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
+- [Call for contribution issues](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
 
 ## Quick Start
 
@@ -35,7 +36,7 @@ Install the latest release:
 pip install keras-nlp --upgrade
 ```
 
-Tokenize text, build a transformer, and train a single batch:
+Tokenize text, build a tiny transformer, and train a single batch:
 
 ```python
 import keras_nlp
@@ -46,7 +47,9 @@ from tensorflow import keras
 vocab = ["[UNK]", "the", "qu", "##ick", "br", "##own", "fox", "jumped", "."]
 inputs = ["The quick brown fox jumped.", "The fox slept."]
 tokenizer = keras_nlp.tokenizers.WordPieceTokenizer(
-    vocabulary=vocab, sequence_length=10)
+    vocabulary=vocab,
+    sequence_length=10,
+)
 X, Y = tokenizer(inputs), tf.constant([1, 0])
 
 # Create a tiny transformer.
@@ -74,22 +77,21 @@ For a complete model building tutorial, see our guide on
 
 ## Contributing
 
-If you'd like to contribute, please see our [contributing guide](CONTRIBUTING.md).
+If you'd like to contribute, our [contributing guide](CONTRIBUTING.md)
+contains instructions for setting up a development environment and contributing
+PRs.
 
-The fastest way to contribute it to find
-[open issues](https://github.com/keras-team/keras-nlp/issues) that need
-an assignee. We maintain a
-[good first issue](
-https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
-tag for newcomers to the project, and a longer list of
-[contributions welcome](
-https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
-issues that may range in complexity.
+The fastest way to contribute it to find open issues that need an assignee. We
+maintain two lists of github tags for contributors:
+ - [good first issue](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22):
+   a list of small, well defined issues for newcomers to the project.
+ - [contributions welcome](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22):
+   a larger list of issues that may range in complexity.
 
 If you would like propose a new symbol or feature, please first read our
 [Roadmap](ROADMAP.md) and [API Design Guidelines](API_DESIGN.md), then open
 an issue to discuss. If you have a specific design in mind, please include a
-[Colab](https://colab.research.google.com/) notebook showing the proposed design
+Colab notebook showing the proposed design
 in a end-to-end example. Keep in mind that design for a new feature or use case
 may take longer than contributing to an open issue with a vetted-design.
 
diff --git a/ROADMAP.md b/ROADMAP.md
index 39e18cf1c5..6b0e9fb518 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -8,12 +8,9 @@
   assemble state-of-the-art NLP workflows.
 
 - **A collection of guides and examples.** This effort is split between two
-  locations:
-  
-  On [keras.io](keras.io/keras_nlp), we host a collection of small-scale,
-  easily accessible guides showing end-to-end workflows using KerasNLP.
-
-  In this repository, we host a collection of
+  locations. On [keras.io](keras.io/keras_nlp), we host a collection of
+  small-scale, easily accessible guides showing end-to-end workflows using
+  KerasNLP. In this repository, we host a collection of
   [examples](https://github.com/keras-team/keras-nlp/tree/master/examples) on
   how to train large-scale, state-of-the-art models from scratch. This is not
   part of the library itself, but rather a way to vet our components and show
@@ -47,23 +44,6 @@
   directly to [Tensorflow Text](https://www.tensorflow.org/text) or
   core Tensorflow.
 
-## Philosophy
-
-- **Let user needs be our compass.** Any modular building block that NLP
-  practitioners need is in scope, whether it's data loading, augmentation, model
-  building, evaluation metrics, or visualization utils.
-
-- **Be resolutely high-level.** Even if something is easy to do by hand in 5
-  lines, package it as a one liner.
-
-- **Balance ease of use and flexibility.** Simple things should be easy, and
-  arbitrarily advanced use cases should be possible. There should always be a
-  "we need to go deeper" path available to our most expert users.
-
-- **Grow as a platform and as a community.** KerasNLP development should be
-  driven by the community, with feature and release planning happening out in
-  the open on GitHub.
-
 ## Areas of interest
 
 At this point in our development cycle, we are primarily interested in providing
@@ -73,10 +53,10 @@ sequence-to-sequence transformer like T5) and a end-to-end task (e.g.
 summarization), we should have a clear code example in mind and a list of
 components to use.
 
-Note that while we will be supporting large-scale Transformer models as a
-key offering from our library, but we are not a strictly Transformer-based
-modeling library. We aim to support simple techniques such as n-gram models and
-word2vec embeddings, and make it easy to hop between different approaches.
+Note that while Transformers is a key offering from our library,
+we are not a strictly Transformer-based modeling library. We aim to support
+simple techniques such as n-gram models and word2vec embeddings, and make it
+easy to hop between different approaches.
 
 Current focus areas:
 
@@ -99,22 +79,23 @@ Current focus areas:
 - Metrics for model evaluation, such a ROUGE and BLEU for evaluating translation
   quality.
 
-## Citations
+## Citation bar
 
 At this moment in time, we have no set citation bar for development, but due to
 the newness of the library we want to focus our efforts on a small subset of the
 best known and most effective NLP techniques.
 
-Proposed components should usually either be part of a very well architecture
-(think 1000s of citations) or contribute in some meaningful way to the usability
-of an end-to-end workflow.
+Proposed components should usually either be part of a very well known
+architecture or contribute in some meaningful way to the usability of an
+end-to-end workflow.
 
 ## Pre-trained modeling workflows
 
 Pre-training many modern NLP models is prohibitively expensive and
-time-consuming for an average user. A key goal with the KerasNLP project is to
-support easy use of pre-trained models using KerasNLP components.
+time-consuming for an average user. A key goal with for the KerasNLP project is
+to have KerasNLP components available in a pre-trained model offering of some
+form.
 
-We are working with the rest of the Tensorflow ecosystem (e.g. TF Hub), to
-provide a coherent plan for accessing pre-trained models. We will continue to
-share updates as they are available.
+We are working with the rest of the Tensorflow ecosystem (e.g. TF Hub,
+TF Models), to provide a coherent plan for accessing pre-trained models. We will
+continue to share updates as they are available.

From 9eba5006fe5718e223e4d9a4a335dbe0dd9664c4 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Tue, 3 May 2022 11:16:02 -0700
Subject: [PATCH 04/15] Update link names

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 0f1b7a4f38..d6e9007d10 100644
--- a/README.md
+++ b/README.md
@@ -22,11 +22,11 @@ We are a new and growing project, and welcome [contributions](#contributing).
 
 ## Quick Links
 
-- [Documentation](https://keras.io/keras_nlp)
-- [Contributing guide](CONTRIBUTING.md)
+- [Documentation and Guides](https://keras.io/keras_nlp)
+- [Contributing](CONTRIBUTING.md)
 - [Roadmap](ROADMAP.md)
 - [API Design Guidelines](API_DESIGN.md)
-- [Call for contribution issues](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
+- [Call for Contributions](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
 
 ## Quick Start
 

From 6c4f49af1a877959dd2dd7c7b0d82d0edc311782 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Fri, 6 May 2022 10:57:26 -0700
Subject: [PATCH 05/15] Add a style guide and consolidate contributing

---
 API_DESIGN.md   |  2 +-
 CODE_STYLE.md   | 91 +++++++++++++++++++++++++++++++++++++++++++++++++
 CONTRIBUTING.md | 74 +++++++++++++++++++++++++++-------------
 README.md       | 37 +++++---------------
 4 files changed, 151 insertions(+), 53 deletions(-)
 create mode 100644 CODE_STYLE.md

diff --git a/API_DESIGN.md b/API_DESIGN.md
index 36ec290cd4..74ce91afc0 100644
--- a/API_DESIGN.md
+++ b/API_DESIGN.md
@@ -1,4 +1,4 @@
-# KerasNLP Design Guidelines
+# KerasNLP Design Guide
 
 Before reading this document, please read the
 [Keras API design guidelines](https://github.com/keras-team/governance/blob/master/keras_api_design_guidelines.md).
diff --git a/CODE_STYLE.md b/CODE_STYLE.md
new file mode 100644
index 0000000000..d0af05f849
--- /dev/null
+++ b/CODE_STYLE.md
@@ -0,0 +1,91 @@
+# Style Guide
+
+## Use black
+
+For the most part, following are style guide is very simple, we just use
+[black](https://github.com/psf/black) to format code. See our
+[Contributing Guide](CONTRIBUTING.md) for how to run our formatting scripts.
+
+## Import keras and keras_nlp as top-level objects
+
+Prefer importing `tf`, `keras` and `keras_nlp` as top-level objects. We want
+it to be clear to a reader which symbols are from `keras_nlp` and which are
+from core `keras`.
+
+For guides and examples using KerasNLP, the import block should look as follows:
+
+```python
+import keras_nlp
+import tensorflow as tf
+from tensorflow import keras
+```
+
+❌ `tf.keras.activations.X`
+✅ `keras.activations.X`
+
+❌ `layers.X`
+✅ `keras.layers.X` or `keras_nlp.layers.X`
+
+❌ `Dense(1, activation='softmax')`
+✅ `keras.layers.Dense(1, activation='softmax')`
+
+For KerasNLP library code, `keras_nlp` will not be directly imported, but
+`keras` should still be as a top-level object used to access library symbols.
+
+## Ideal layer style
+
+When writing a new KerasNLP layer (or tokenizer), please make sure to do the
+following:
+
+- Accept `**kwargs` in `__init__` and forward this to the super class.
+- Keep a python attribute on the layer for each `__init__` argument to the
+  layer. The name and value should match the passed value.
+- Write a `get_config()` which chains to super.
+- Document the layer behavior thouroughly including call behavior, on the
+  class
+- Always include usage examples including the full symbol location.
+
+````python
+class Linear(keras.layers.Layer):
+    """A simple WX + B linear layer.
+
+    This layer contains two trainable parameters, a weight matrix and bias
+    vector. The layer will linearly transform input to an output of `units`
+    size.
+
+    Args:
+        units: The dimensionality of the output space.
+
+    Examples:
+
+    Build a linear model.
+    ```python
+    inputs = keras.Input(shape=(2,))
+    outputs = keras_nlp.layers.Linear(4)(inputs)
+    model = keras.Model(inputs, outputs)
+    ```
+
+    Call the layer on direct input.
+    >>> layer = keras_nlp.layers.Linear(4)
+    >>> layer(tf.zeros(8, 2)) == layer.b
+    True
+    """
+    def __init__(self, units=32, **kwargs):
+        super().__init__(**kwargs)
+        self.units = units
+
+    def build(self, input_shape):
+        super().build(input_shape)
+        self.w = self.add_weight(shape=(input_shape[-1], self.units))
+        self.b = self.add_weight(shape=(self.units,))
+
+    def call(self, inputs):
+        return tf.matmul(inputs, self.w) + self.b
+
+    def get_config(self):
+        config = super().get_config()
+        config.update({
+            "units": self.units,
+        })
+        return config
+````
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 460af4d4b5..a0ae040667 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,5 +1,36 @@
 # Contribution guide
 
+KerasNLP is an actively growing project and community! We would love for you
+to get involved. Below are instructions for how to plug into KerasNLP
+development.
+
+## Background reading
+
+Before contributing code, please review our [Style Guide](CODE_STYLE.md) and
+[API Design Guide](API_DESIGN.md).
+
+Our [Roadmap](ROADMAP.md) contains an overview of the project goals and our
+current focus areas.
+
+We follow
+[Google's Open Source Community Guidelines](https://opensource.google/conduct/).
+
+## Finding an issue
+
+The fastest way to contribute it to find open issues that need an assignee. We
+maintain two lists of github tags for contributors:
+
+ - [good first issue](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22):
+   a list of small, well defined issues for newcomers to the project.
+ - [contributions welcome](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22):
+   a larger list of issues that may range in complexity.
+
+If you would like propose a new symbol or feature within our API design
+guidelines, please open an issue to discuss. If you have a specific design in
+mind, please include a Colab notebook showing the proposed design
+in a end-to-end example. Keep in mind that design for a new feature or use case
+may take longer than contributing to an open issue with a vetted-design.
+
 ## How to contribute code
 
 Follow these steps to submit your code contribution.
@@ -47,22 +78,7 @@ request gets approved by the reviewer.
 
 Once the pull request is approved, a team member will take care of merging.
 
-## Developing on Windows
-
-For Windows development, we recommend using WSL (Windows Subsystem for Linux),
-so you can run the shell scripts in this repository. We will not support
-Windows Shell/PowerShell. You can refer
-[to these instructions](https://docs.microsoft.com/en-us/windows/wsl/install)
-for WSL installation.
-
-Note that if you are using Windows Subsystem for Linux (WSL), make sure you 
-clone the repo with Linux style LF line endings and change the default setting
-for line separator in your Text Editor before running the format
-or lint scripts. This is automatically done if you clone using git inside WSL.
-If there is conflict due to the line endings you might see an error
-like - `: invalid option`.
-
-## Setup environment
+## Development Environment
 
 Python 3.7 or later is required.
 
@@ -87,16 +103,16 @@ Following these commands you should be able to run the tests using
 `pytest keras_nlp`. Please report any issues running tests following these
 steps.
 
-## Run tests
+### Running tests
 
 KerasNLP is tested using [PyTest](https://docs.pytest.org/en/6.2.x/).
 
-### Run a test file
+#### Run a test file
 
 To run a test file, run `pytest path/to/file` from the root directory of the
 repository.
 
-### Run a single test case
+#### Run a single test case
 
 To run a single test, you can use `-k=<your_regex>`
 to use regular expression to match the test you want to run. For example, you
@@ -107,7 +123,7 @@ whose names contain `import`:
 pytest keras_nlp/keras_nlp/integration_tests/import_test.py -k="import"
 ```
 
-### Run all tests
+#### Run all tests
 
 You can run the unit tests for KerasNLP by running:
 
@@ -115,7 +131,7 @@ You can run the unit tests for KerasNLP by running:
 pytest keras_nlp/
 ```
 
-## Formatting the Code
+### Formatting Code
 
 We use `flake8`, `isort` and `black` for code formatting.  You can run
 the following commands manually every time you want to format your code:
@@ -127,7 +143,17 @@ If after running these the CI flow is still failing, try updating `flake8`,
 `isort` and `black`. This can be done by running `pip install --upgrade black`,
 `pip install --upgrade flake8`, and `pip install --upgrade isort`.
 
-## Community Guidelines
+### Developing on Windows
 
-This project follows 
-[Google's Open Source Community Guidelines](https://opensource.google/conduct/).
+For Windows development, we recommend using WSL (Windows Subsystem for Linux),
+so you can run the shell scripts in this repository. We will not support
+Windows Shell/PowerShell. You can refer
+[to these instructions](https://docs.microsoft.com/en-us/windows/wsl/install)
+for WSL installation.
+
+Note that if you are using Windows Subsystem for Linux (WSL), make sure you 
+clone the repo with Linux style LF line endings and change the default setting
+for line separator in your Text Editor before running the format
+or lint scripts. This is automatically done if you clone using git inside WSL.
+If there is conflict due to the line endings you might see an error
+like - `: invalid option`.
diff --git a/README.md b/README.md
index d6e9007d10..c8a878d816 100644
--- a/README.md
+++ b/README.md
@@ -18,14 +18,15 @@ components are first-party Keras objects that are too specialized to be
 added to core Keras, but that receive the same level of polish as the rest of
 the Keras API.
 
-We are a new and growing project, and welcome [contributions](#contributing).
+We are a new and growing project, and welcome [contributions](CONTRIBUTING.md).
 
 ## Quick Links
 
 - [Documentation and Guides](https://keras.io/keras_nlp)
 - [Contributing](CONTRIBUTING.md)
 - [Roadmap](ROADMAP.md)
-- [API Design Guidelines](API_DESIGN.md)
+- [Style Guide](CODE_STYLE.md)
+- [API Design Guide](API_DESIGN.md)
 - [Call for Contributions](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
 
 ## Quick Start
@@ -75,32 +76,6 @@ model.train_on_batch(X, Y)
 For a complete model building tutorial, see our guide on
 [pretraining a transformer](keras.io/guides/keras_nlp/transformer_pretraining).
 
-## Contributing
-
-If you'd like to contribute, our [contributing guide](CONTRIBUTING.md)
-contains instructions for setting up a development environment and contributing
-PRs.
-
-The fastest way to contribute it to find open issues that need an assignee. We
-maintain two lists of github tags for contributors:
- - [good first issue](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22):
-   a list of small, well defined issues for newcomers to the project.
- - [contributions welcome](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22):
-   a larger list of issues that may range in complexity.
-
-If you would like propose a new symbol or feature, please first read our
-[Roadmap](ROADMAP.md) and [API Design Guidelines](API_DESIGN.md), then open
-an issue to discuss. If you have a specific design in mind, please include a
-Colab notebook showing the proposed design
-in a end-to-end example. Keep in mind that design for a new feature or use case
-may take longer than contributing to an open issue with a vetted-design.
-
-Thank you to all of our wonderful contributors!
-
-<a href="https://github.com/keras-team/keras-nlp/graphs/contributors">
-  <img src="https://contrib.rocks/image?repo=keras-team/keras-nlp" />
-</a>
-
 ## Compatibility
 
 We follow [Semantic Versioning](https://semver.org/), and plan to
@@ -121,3 +96,9 @@ Here is the BibTeX entry:
   howpublished={\url{https://github.com/keras-team/keras-nlp}},
 }
 ```
+
+Thank you to all of our wonderful contributors!
+
+<a href="https://github.com/keras-team/keras-nlp/graphs/contributors">
+  <img src="https://contrib.rocks/image?repo=keras-team/keras-nlp" />
+</a>
\ No newline at end of file

From 248a732bf4e39aee5fd8d5df6329392079785186 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Fri, 6 May 2022 11:04:05 -0700
Subject: [PATCH 06/15] More contributing guide updates

---
 CONTRIBUTING.md | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index a0ae040667..3fd2c50d8f 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -25,13 +25,14 @@ maintain two lists of github tags for contributors:
  - [contributions welcome](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22):
    a larger list of issues that may range in complexity.
 
-If you would like propose a new symbol or feature within our API design
-guidelines, please open an issue to discuss. If you have a specific design in
-mind, please include a Colab notebook showing the proposed design
-in a end-to-end example. Keep in mind that design for a new feature or use case
-may take longer than contributing to an open issue with a vetted-design.
+If you would like propose a new symbol or feature, please first review our
+design guide and roadmap linked above, and issue to discuss. If you have a
+specific design in mind, please include a Colab notebook showing the proposed
+design in a end-to-end example. Keep in mind that design for a new feature or
+use case may take longer than contributing to an open issue with a
+vetted-design.
 
-## How to contribute code
+## Contributing code
 
 Follow these steps to submit your code contribution.
 
@@ -78,7 +79,7 @@ request gets approved by the reviewer.
 
 Once the pull request is approved, a team member will take care of merging.
 
-## Development Environment
+## Setting up an Environment
 
 Python 3.7 or later is required.
 
@@ -103,16 +104,16 @@ Following these commands you should be able to run the tests using
 `pytest keras_nlp`. Please report any issues running tests following these
 steps.
 
-### Running tests
+## Testing changes
 
 KerasNLP is tested using [PyTest](https://docs.pytest.org/en/6.2.x/).
 
-#### Run a test file
+### Run a test file
 
 To run a test file, run `pytest path/to/file` from the root directory of the
 repository.
 
-#### Run a single test case
+### Run a single test case
 
 To run a single test, you can use `-k=<your_regex>`
 to use regular expression to match the test you want to run. For example, you
@@ -123,7 +124,7 @@ whose names contain `import`:
 pytest keras_nlp/keras_nlp/integration_tests/import_test.py -k="import"
 ```
 
-#### Run all tests
+### Run all tests
 
 You can run the unit tests for KerasNLP by running:
 
@@ -131,7 +132,7 @@ You can run the unit tests for KerasNLP by running:
 pytest keras_nlp/
 ```
 
-### Formatting Code
+## Formatting Code
 
 We use `flake8`, `isort` and `black` for code formatting.  You can run
 the following commands manually every time you want to format your code:
@@ -143,7 +144,7 @@ If after running these the CI flow is still failing, try updating `flake8`,
 `isort` and `black`. This can be done by running `pip install --upgrade black`,
 `pip install --upgrade flake8`, and `pip install --upgrade isort`.
 
-### Developing on Windows
+## Developing on Windows
 
 For Windows development, we recommend using WSL (Windows Subsystem for Linux),
 so you can run the shell scripts in this repository. We will not support

From d3a86c1578042b8479dc5e3e40cd8b47bbdabde9 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Fri, 6 May 2022 11:12:59 -0700
Subject: [PATCH 07/15] Fixups

---
 API_DESIGN.md |  2 +-
 CODE_STYLE.md | 25 +++++++++++++------------
 2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/API_DESIGN.md b/API_DESIGN.md
index 74ce91afc0..0002b0d857 100644
--- a/API_DESIGN.md
+++ b/API_DESIGN.md
@@ -1,4 +1,4 @@
-# KerasNLP Design Guide
+# API Design Guide
 
 Before reading this document, please read the
 [Keras API design guidelines](https://github.com/keras-team/governance/blob/master/keras_api_design_guidelines.md).
diff --git a/CODE_STYLE.md b/CODE_STYLE.md
index d0af05f849..16dc763464 100644
--- a/CODE_STYLE.md
+++ b/CODE_STYLE.md
@@ -2,7 +2,7 @@
 
 ## Use black
 
-For the most part, following are style guide is very simple, we just use
+For the most part, following our code style is very simple, we just use
 [black](https://github.com/psf/black) to format code. See our
 [Contributing Guide](CONTRIBUTING.md) for how to run our formatting scripts.
 
@@ -20,30 +20,31 @@ import tensorflow as tf
 from tensorflow import keras
 ```
 
-❌ `tf.keras.activations.X`
-✅ `keras.activations.X`
+- ❌ `tf.keras.activations.X`
+- ✅ `keras.activations.X`
 
-❌ `layers.X`
-✅ `keras.layers.X` or `keras_nlp.layers.X`
+- ❌ `layers.X`
+- ✅ `keras.layers.X` or `keras_nlp.layers.X`
 
-❌ `Dense(1, activation='softmax')`
-✅ `keras.layers.Dense(1, activation='softmax')`
+- ❌ `Dense(1, activation='softmax')`
+- ✅ `keras.layers.Dense(1, activation='softmax')`
 
 For KerasNLP library code, `keras_nlp` will not be directly imported, but
 `keras` should still be as a top-level object used to access library symbols.
 
 ## Ideal layer style
 
-When writing a new KerasNLP layer (or tokenizer), please make sure to do the
-following:
+When writing a new KerasNLP layer (or tokenizer or metric), please make sure to
+do the following:
 
 - Accept `**kwargs` in `__init__` and forward this to the super class.
 - Keep a python attribute on the layer for each `__init__` argument to the
   layer. The name and value should match the passed value.
 - Write a `get_config()` which chains to super.
-- Document the layer behavior thouroughly including call behavior, on the
-  class
-- Always include usage examples including the full symbol location.
+- Document the layer behavior thouroughly including call behavior though a
+  class level docstring. Generally methods like `build()` and `call()` should
+  not have their own docstring.
+- Always include usage examples using the full symbol location in `keras_nlp`.
 
 ````python
 class Linear(keras.layers.Layer):

From e09acca5ff61fcbe1552bb4f61d666702fbb2980 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Fri, 6 May 2022 11:15:12 -0700
Subject: [PATCH 08/15] Formatting futzing

---
 CODE_STYLE.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/CODE_STYLE.md b/CODE_STYLE.md
index 16dc763464..390a1e6269 100644
--- a/CODE_STYLE.md
+++ b/CODE_STYLE.md
@@ -20,14 +20,14 @@ import tensorflow as tf
 from tensorflow import keras
 ```
 
-- ❌ `tf.keras.activations.X`
-- ✅ `keras.activations.X`
+❌ `tf.keras.activations.X`<br/>
+✅ `keras.activations.X`
 
-- ❌ `layers.X`
-- ✅ `keras.layers.X` or `keras_nlp.layers.X`
+❌ `layers.X`<br/>
+✅ `keras.layers.X` or `keras_nlp.layers.X`
 
-- ❌ `Dense(1, activation='softmax')`
-- ✅ `keras.layers.Dense(1, activation='softmax')`
+❌ `Dense(1, activation='softmax')`<br/>
+✅ `keras.layers.Dense(1, activation='softmax')`
 
 For KerasNLP library code, `keras_nlp` will not be directly imported, but
 `keras` should still be as a top-level object used to access library symbols.

From d2e46a42efea74cd83c4c5d719a0f6c505cc41f3 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Fri, 6 May 2022 11:30:28 -0700
Subject: [PATCH 09/15] extra callout to docs

---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
index c8a878d816..96f726d5d0 100644
--- a/README.md
+++ b/README.md
@@ -18,6 +18,7 @@ components are first-party Keras objects that are too specialized to be
 added to core Keras, but that receive the same level of polish as the rest of
 the Keras API.
 
+You can browse our official documentation [here](https://keras.io/keras_nlp).
 We are a new and growing project, and welcome [contributions](CONTRIBUTING.md).
 
 ## Quick Links

From 95abf5478909069f896cbf26a4469ccaf17f8ae3 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Sun, 8 May 2022 19:40:23 -0700
Subject: [PATCH 10/15] Fill out roadmap focus areas in more detail

---
 ROADMAP.md | 105 +++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 82 insertions(+), 23 deletions(-)

diff --git a/ROADMAP.md b/ROADMAP.md
index 6b0e9fb518..a2d9d99711 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -44,7 +44,14 @@
   directly to [Tensorflow Text](https://www.tensorflow.org/text) or
   core Tensorflow.
 
-## Areas of interest
+- **KerasNLP is not a Transformer only library.**
+  Transformer based models are a key offering for KerasNLP, and these
+  architectures should be easy to train and use within the library. However, we
+  want to support other types of models, such as n-gram or word2vec approaches
+  that might be more suited to some real-world tasks (e.g. low-resource
+  deployments).
+
+## Focus areas for 2022
 
 At this point in our development cycle, we are primarily interested in providing
 building blocks for a short list of "greatest hits" NLP models (such as BERT,
@@ -53,31 +60,83 @@ sequence-to-sequence transformer like T5) and a end-to-end task (e.g.
 summarization), we should have a clear code example in mind and a list of
 components to use.
 
-Note that while Transformers is a key offering from our library,
-we are not a strictly Transformer-based modeling library. We aim to support
-simple techniques such as n-gram models and word2vec embeddings, and make it
-easy to hop between different approaches.
-
-Current focus areas:
-
-- In-graph tokenization leveraging
-  [Tensorflow Text](https://www.tensorflow.org/text). We aim to have a fully
-  featured offering of character, word, and sub-word tokenizers that run
-  within the Tensorflow graph.
-- Scalable and easily trainable modeling
-  [examples](https://github.com/keras-team/keras-nlp/tree/master/examples)
-  runnable on Google Cloud. We will continue to port our BERT example to run
-  entirely on keras_nlp components for both training and preprocessing, and
-  give easy recipes for running multi-worker training. Once this is done, we
-  would like to extend this effort to other popular architectures.
-- Text generation workflows. We would like to support text generation from
-  trained models using greedy or beam search in a clear and easy to use
-  workflow.
+Below, we describe our areas of focus for the year in more detail.
+
+### Easy-to-use and feature-complete tokenization
+
+KerasNLP should be the "go-to" tokenization solution for Keras model training
+and deployment by the end of 2022.
+
+The major tasks within this effort:
+
+- Work with Tensorflow Text to continue to support a growing range of
+  tokenization options and popular vocabulary formats. For example, we would
+  like to add support for byte-level BPE tokenization within the Tensorflow
+  graph.
+- Pre-trained sub-word tokenizers for any language. Training a tokenizer can
+  add a lot of friction to a project, particularly when working working in a
+  language where examples are less readily available. We would like to support
+  a pre-trained tokenization offering that makes it easy to start training
+  models on input text right away.
+- A standardized way to training tokenizer vocabularies. Training vocabularies
+  for various tokenization algorithms can be a fractured and painful experience.
+  We should offer a standardized experience for training new vocabularies.
+
+### Scalable examples of popular model architectures using KerasNLP
+
+We would like our
+[examples](https://github.com/keras-team/keras-nlp/tree/master/examples)
+directory to contain scalable implementations of popular model
+architectures easily runnable on Google Cloud. Note that these will not be
+shipped with the library itself.
+
+These examples will serve two purposes—a demonstration to the community of how
+models can be built using KerasNLP, and a way to vet our the performance and
+accuracy of our library components on both TPUs and GPUs.
+
+At this moment in time, our focus is on polishing our BERT example. We would
+like it to run entirely on KerasNLP components for both training and
+preprocessing, and come easy recipes for running multi-worker training jobs.
+Once this is done, we would like to extend our examples directory to other
+popular architectures (e.g. RoBERTa and ELECTRA).
+
+As we move forward with KerasNLP as a whole, we expect development for new
+components (say, a new attention mechanism) to happen in tandem with an
+example demonstrating the component in an end-to-end architecture.
+
+### Tools for data preprocessing and postprocessing for end-to-end workflows
+
+It should be easy to use KerasNLP to take a trained model and use it for a wide
+range of real world NLP tasks. We should support classification, text
+generation, summarization, translation, name-entity recognition, and question
+answering. We should have a guide for each of these tasks using KerasNLP by
+the end of 2022.
+
+We are looking for simple, modular components that make it easy to build
+end-to-end workflows for any of these tasks.
+
+Currently projects in this area include:
+
+- Utilties for generating sequences of text using greedy or beam search.
+- Metrics for evaluating the quality of generated sequences, such a ROUGE and
+  BLEU.
 - Data augmentation preprocessing layers for domains with limited data. These
   layers will allow easily defining `tf.data` pipelines that augment input
   example sentences on the fly.
-- Metrics for model evaluation, such a ROUGE and BLEU for evaluating translation
-  quality.
+
+### Accessible guides and examples on keras.io
+
+For all of the above focus areas, we would like to make ensure we have an
+industry leading collection of easy to use guides and examples.
+
+These examples should be easy to follow, run within a colab notebook, and
+provide a practical starting place for solving most real-world NLP problems.
+
+This will continue to be a key investment area for the library. If you have an
+idea for a guide or example, please open an issue to discuss.
+
+By the end of 2022, most new keras.io NLP examples should be using the KerasNLP
+library.
 
 ## Citation bar
 

From 430cd4f74778df511fb1c4a56c2b72f2c6781091 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Mon, 9 May 2022 11:07:07 -0700
Subject: [PATCH 11/15] More roadmap info

---
 ROADMAP.md | 37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/ROADMAP.md b/ROADMAP.md
index a2d9d99711..4e7425766a 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -71,16 +71,16 @@ The major tasks within this effort:
 
 - Work with Tensorflow Text to continue to support a growing range of
   tokenization options and popular vocabulary formats. For example, we would
-  like to add support for byte-level BPE tokenization within the Tensorflow
-  graph.
+  like to add support for byte-level BPE tokenization (the RoBERTa and GPT
+  tokenizer) within the Tensorflow graph.
 - Pre-trained sub-word tokenizers for any language. Training a tokenizer can
   add a lot of friction to a project, particularly when working working in a
   language where examples are less readily available. We would like to support
-  a pre-trained tokenization offering that makes it easy to start training
-  models on input text right away.
-- A standardized way to training tokenizer vocabularies. Training vocabularies
-  for various tokenization algorithms can be a fractured and painful experience.
-  We should offer a standardized experience for training new vocabularies.
+  a pre-trained tokenization offering that allows a user to choose a tokenizer,
+  language, and vocabulary size and then download an off the shelf vocabulary.
+- A standardized way to training tokenizer vocabularies. As another way to
+  reduce the friction of training a tokenizer, we should offer a standardized
+  experience for training new vocabularies.
 
 ### Scalable examples of popular model architectures using KerasNLP
 
@@ -92,28 +92,32 @@ shipped with the library itself.
 
 These examples will serve two purposes—a demonstration to the community of how
 models can be built using KerasNLP, and a way to vet our the performance and
-accuracy of our library components on both TPUs and GPUs.
+accuracy of our library components on both TPUs and GPUs at scale.
 
 At this moment in time, our focus is on polishing our BERT example. We would
 like it to run entirely on KerasNLP components for both training and
-preprocessing, and come easy recipes for running multi-worker training jobs.
-Once this is done, we would like to extend our examples directory to other
+preprocessing, and come with easy recipes for running multi-worker training
+jobs. Once this is done, we would like to extend our examples directory to other
 popular architectures (e.g. RoBERTa and ELECTRA).
 
 As we move forward with KerasNLP as a whole, we expect development for new
 components (say, a new attention mechanism) to happen in tandem with an
 example demonstrating the component in an end-to-end architecture.
 
+By the end of 2022, we should have a actively growing collection of examples
+models, with a standardized set of training scripts, that match expected
+performance as reported in publications.
+
 ### Tools for data preprocessing and postprocessing for end-to-end workflows
 
-It should be easy to use KerasNLP to take a trained model and use it for a wide
+It should be easy to take trained Keras language model and use it for a wide
 range of real world NLP tasks. We should support classification, text
 generation, summarization, translation, name-entity recognition, and question
 answering. We should have a guide for each of these tasks using KerasNLP by
 the end of 2022.
 
-We are looking for simple, modular components that make it easy to build
-end-to-end workflows for any of these tasks.
+We are looking to develop simple, modular components that make it easy to build
+end-to-end workflows for each of these tasks.
 
 Currently projects in this area include:
 
@@ -131,12 +135,15 @@ industry leading collection of easy to use guides and examples.
 
 These examples should be easy to follow, run within a colab notebook, and
 provide a practical starting place for solving most real-world NLP problems.
+Given the scale of modern NLP models, this will often involve scaling down the
+model or data size for a particular task while preserving the core of what we
+are trying to explain to the reader.
 
 This will continue to be a key investment area for the library. If you have an
 idea for a guide or example, please open an issue to discuss.
 
-By the end of 2022, most new keras.io NLP examples should be using the KerasNLP
-library.
+By the end of 2022, most new NLP examples on keras.io should be use
+KerasNLP library.
 
 ## Citation bar
 

From 3c90e3231ff64e8ee9cb1bc73ad479acfb30fd11 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Mon, 9 May 2022 11:33:32 -0700
Subject: [PATCH 12/15] Move files around; edits

---
 API_DESIGN.md => API_DESIGN_GUIDE.md |  0
 CONTRIBUTING.md                      |  4 ++--
 README.md                            |  4 ++--
 ROADMAP.md                           | 14 +++++++-------
 CODE_STYLE.md => STYLE_GUIDE.md      |  8 ++++----
 5 files changed, 15 insertions(+), 15 deletions(-)
 rename API_DESIGN.md => API_DESIGN_GUIDE.md (100%)
 rename CODE_STYLE.md => STYLE_GUIDE.md (94%)

diff --git a/API_DESIGN.md b/API_DESIGN_GUIDE.md
similarity index 100%
rename from API_DESIGN.md
rename to API_DESIGN_GUIDE.md
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 3fd2c50d8f..5e1e4b919d 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -6,8 +6,8 @@ development.
 
 ## Background reading
 
-Before contributing code, please review our [Style Guide](CODE_STYLE.md) and
-[API Design Guide](API_DESIGN.md).
+Before contributing code, please review our [Style Guide](STYLE_GUIDE.md) and
+[API Design Guide](API_DESIGN_GUIDE.md).
 
 Our [Roadmap](ROADMAP.md) contains an overview of the project goals and our
 current focus areas.
diff --git a/README.md b/README.md
index 96f726d5d0..b89407be8b 100644
--- a/README.md
+++ b/README.md
@@ -26,8 +26,8 @@ We are a new and growing project, and welcome [contributions](CONTRIBUTING.md).
 - [Documentation and Guides](https://keras.io/keras_nlp)
 - [Contributing](CONTRIBUTING.md)
 - [Roadmap](ROADMAP.md)
-- [Style Guide](CODE_STYLE.md)
-- [API Design Guide](API_DESIGN.md)
+- [Style Guide](STYLE_GUIDE.md)
+- [API Design Guide](API_DESIGN_GUIDE.md)
 - [Call for Contributions](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
 
 ## Quick Start
diff --git a/ROADMAP.md b/ROADMAP.md
index 4e7425766a..35cc8dfdf0 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -45,11 +45,11 @@
   core Tensorflow.
 
 - **KerasNLP is not a Transformer only library.**
-  Transformer based models are a key offering for KerasNLP, and these
-  architectures should be easy to train and use within the library. However, we
-  want to support other types of models, such as n-gram or word2vec approaches
-  that might be more suited to some real-world tasks (e.g. low-resource
-  deployments).
+  Transformer based models are a key offering for KerasNLP, and they should be
+  easy to train and use within the library. However, we need to support other
+  types of models, such as n-gram or word2vec approaches that might run more
+  easily on limited hardware. We will always want the most practical tool for
+  the task, regardless of architecture.
 
 ## Focus areas for 2022
 
@@ -110,7 +110,7 @@ performance as reported in publications.
 
 ### Tools for data preprocessing and postprocessing for end-to-end workflows
 
-It should be easy to take trained Keras language model and use it for a wide
+It should be easy to take a trained Keras language model and use it for a wide
 range of real world NLP tasks. We should support classification, text
 generation, summarization, translation, name-entity recognition, and question
 answering. We should have a guide for each of these tasks using KerasNLP by
@@ -121,7 +121,7 @@ end-to-end workflows for each of these tasks.
 
 Currently projects in this area include:
 
-- Utilties for generating sequences of text using greedy or beam search.
+- Utilities for generating sequences of text using greedy or beam search.
 - Metrics for evaluating the quality of generated sequences, such a ROUGE and
   BLEU.
 - Data augmentation preprocessing layers for domains with limited data. These
diff --git a/CODE_STYLE.md b/STYLE_GUIDE.md
similarity index 94%
rename from CODE_STYLE.md
rename to STYLE_GUIDE.md
index 390a1e6269..24dcf59002 100644
--- a/CODE_STYLE.md
+++ b/STYLE_GUIDE.md
@@ -21,13 +21,13 @@ from tensorflow import keras
 ```
 
 ❌ `tf.keras.activations.X`<br/>
-✅ `keras.activations.X`
+✔️ `keras.activations.X`
 
 ❌ `layers.X`<br/>
-✅ `keras.layers.X` or `keras_nlp.layers.X`
+✔️ `keras.layers.X` or `keras_nlp.layers.X`
 
 ❌ `Dense(1, activation='softmax')`<br/>
-✅ `keras.layers.Dense(1, activation='softmax')`
+✔️ `keras.layers.Dense(1, activation='softmax')`
 
 For KerasNLP library code, `keras_nlp` will not be directly imported, but
 `keras` should still be as a top-level object used to access library symbols.
@@ -66,7 +66,7 @@ class Linear(keras.layers.Layer):
     model = keras.Model(inputs, outputs)
     ```
 
-    Call the layer on direct input.
+    Call the layer directly on input.
     >>> layer = keras_nlp.layers.Linear(4)
     >>> layer(tf.zeros(8, 2)) == layer.b
     True

From 618f5771117fb14adca6f591153215243000c815 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Wed, 11 May 2022 18:13:01 -0700
Subject: [PATCH 13/15] Address review comments

---
 API_DESIGN_GUIDE.md                           | 16 ++--
 README.md                                     | 20 ++--
 ROADMAP.md                                    | 25 ++---
 STYLE_GUIDE.md                                | 91 +++++++++++++------
 .../integration_tests/basic_usage_test.py     | 18 ++--
 5 files changed, 106 insertions(+), 64 deletions(-)

diff --git a/API_DESIGN_GUIDE.md b/API_DESIGN_GUIDE.md
index 0002b0d857..151463e4da 100644
--- a/API_DESIGN_GUIDE.md
+++ b/API_DESIGN_GUIDE.md
@@ -54,15 +54,19 @@ class RougeL(keras.metrics.Metric):
 Our layers, metrics, and tokenizers should be fast and efficient, which means
 running inside the
 [TensorFlow graph](https://www.tensorflow.org/guide/intro_to_graphs)
-whenever possible.
+whenever possible. This means you should be able to wrap annotate a function
+calling a layer, metric or loss with `@tf.function` without running into issues.
 
 [tf.strings](https://www.tensorflow.org/api_docs/python/tf/strings) and
 [tf.text](https://www.tensorflow.org/text/api_docs/python/text) provides a large
-surface on TensorFlow operations that manipulate strings.
-
-If an low-level (c++) operation we need is missing, we should add it in
-collaboration with core TensorFlow or TensorFlow Text. KerasNLP is a python-only
-library.
+surface on TensorFlow operations that manipulate strings. If an low-level (c++)
+operation we need is missing, we should add it in collaboration with core
+TensorFlow or TensorFlow Text. KerasNLP is a python-only library.
+
+We should also strive to keep computation XLA compilable wherever possible (e.g.
+`tf.function(jit_compile=True)`). For trainable modeling components this is
+particularly important due to the performance gains offered by XLA. For
+preprocessing and postprocessing, XLA compilation is not a requirement.
 
 ## Support tf.data for text preprocessing and augmentation
 
diff --git a/README.md b/README.md
index b89407be8b..bb693055f3 100644
--- a/README.md
+++ b/README.md
@@ -46,32 +46,32 @@ import tensorflow as tf
 from tensorflow import keras
 
 # Tokenize some inputs with a binary label.
-vocab = ["[UNK]", "the", "qu", "##ick", "br", "##own", "fox", "jumped", "."]
-inputs = ["The quick brown fox jumped.", "The fox slept."]
+vocab = ["[UNK]", "the", "qu", "##ick", "br", "##own", "fox", "."]
+sentences = ["The quick brown fox jumped.", "The fox slept."]
 tokenizer = keras_nlp.tokenizers.WordPieceTokenizer(
     vocabulary=vocab,
     sequence_length=10,
 )
-X, Y = tokenizer(inputs), tf.constant([1, 0])
+x, y = tokenizer(sentences), tf.constant([1, 0])
 
 # Create a tiny transformer.
 inputs = keras.Input(shape=(None,), dtype="int32")
-x = keras_nlp.layers.TokenAndPositionEmbedding(
+outputs = keras_nlp.layers.TokenAndPositionEmbedding(
     vocabulary_size=len(vocab),
     sequence_length=10,
     embedding_dim=16,
 )(inputs)
-x = keras_nlp.layers.TransformerEncoder(
+outputs = keras_nlp.layers.TransformerEncoder(
     num_heads=4,
     intermediate_dim=32,
-)(x)
-x = keras.layers.GlobalAveragePooling1D()(x)
-outputs = keras.layers.Dense(1, activation="sigmoid")(x)
+)(outputs)
+outputs = keras.layers.GlobalAveragePooling1D()(outputs)
+outputs = keras.layers.Dense(1, activation="sigmoid")(outputs)
 model = keras.Model(inputs, outputs)
 
 # Run a single batch of gradient descent.
-model.compile(loss="binary_crossentropy")
-model.train_on_batch(X, Y)
+model.compile(loss="binary_crossentropy", jit_compile=True)
+model.train_on_batch(x, y)
 ```
 
 For a complete model building tutorial, see our guide on
diff --git a/ROADMAP.md b/ROADMAP.md
index 35cc8dfdf0..d9381bbfd8 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -3,9 +3,9 @@
 ## What KerasNLP is
 
 - **A high-quality library of modular building blocks.** KerasNLP components
-  follow an established Keras interface (e.g. keras.layers.Layer,
-  keras.metrics.Metric, or keras_nlp.tokenizers.Tokenizer), and make it easy to
-  assemble state-of-the-art NLP workflows.
+  follow an established Keras interface (e.g. `keras.layers.Layer`,
+  `keras.metrics.Metric`, or `keras_nlp.tokenizers.Tokenizer`), and make it easy
+  to assemble state-of-the-art NLP workflows.
 
 - **A collection of guides and examples.** This effort is split between two
   locations. On [keras.io](keras.io/keras_nlp), we host a collection of
@@ -38,7 +38,7 @@
   end-to-end workflows, but they're intended purely for demonstration and
   grounding purposes, they're not our main deliverable.
 
-- **KerasNLP is not a repository of low-level string ops, like tf.text.**
+- **KerasNLP is not a repository of low-level string ops, like `tf.text`.**
   KerasNLP is fundamentally an extension of the Keras API: it hosts Keras
   objects, like layers, metrics, or callbacks. Low-level C++ ops should go
   directly to [Tensorflow Text](https://www.tensorflow.org/text) or
@@ -106,7 +106,10 @@ example demonstrating the component in an end-to-end architecture.
 
 By the end of 2022, we should have a actively growing collection of examples
 models, with a standardized set of training scripts, that match expected
-performance as reported in publications.
+performance as reported in publications. On the scalability front, we should
+be running our training scripts on multi-worker GPU and TPU settings, using
+[DTensor](https://www.tensorflow.org/guide/dtensor_overview) for data parallel
+training.
 
 ### Tools for data preprocessing and postprocessing for end-to-end workflows
 
@@ -155,13 +158,13 @@ Proposed components should usually either be part of a very well known
 architecture or contribute in some meaningful way to the usability of an
 end-to-end workflow.
 
-## Pre-trained modeling workflows
+## Pretrained modeling workflows
 
-Pre-training many modern NLP models is prohibitively expensive and
+Pretraining many modern NLP models is prohibitively expensive and
 time-consuming for an average user. A key goal with for the KerasNLP project is
-to have KerasNLP components available in a pre-trained model offering of some
+to have KerasNLP components available in a pretrained model offering of some
 form.
 
-We are working with the rest of the Tensorflow ecosystem (e.g. TF Hub,
-TF Models), to provide a coherent plan for accessing pre-trained models. We will
-continue to share updates as they are available.
+We are working with the rest of the Tensorflow ecosystem, to provide a coherent
+plan for accessing pretrained models. We will continue to share updates as they
+are available.
diff --git a/STYLE_GUIDE.md b/STYLE_GUIDE.md
index 24dcf59002..df528079e7 100644
--- a/STYLE_GUIDE.md
+++ b/STYLE_GUIDE.md
@@ -1,6 +1,6 @@
 # Style Guide
 
-## Use black
+## Use `black`
 
 For the most part, following our code style is very simple, we just use
 [black](https://github.com/psf/black) to format code. See our
@@ -21,16 +21,17 @@ from tensorflow import keras
 ```
 
 ❌ `tf.keras.activations.X`<br/>
-✔️ `keras.activations.X`
+✅ `keras.activations.X`
 
 ❌ `layers.X`<br/>
-✔️ `keras.layers.X` or `keras_nlp.layers.X`
+✅ `keras.layers.X` or `keras_nlp.layers.X`
 
 ❌ `Dense(1, activation='softmax')`<br/>
-✔️ `keras.layers.Dense(1, activation='softmax')`
+✅ `keras.layers.Dense(1, activation='softmax')`
 
 For KerasNLP library code, `keras_nlp` will not be directly imported, but
-`keras` should still be as a top-level object used to access library symbols.
+`keras` should still be used as a top-level object used to access library
+symbols.
 
 ## Ideal layer style
 
@@ -41,52 +42,86 @@ do the following:
 - Keep a python attribute on the layer for each `__init__` argument to the
   layer. The name and value should match the passed value.
 - Write a `get_config()` which chains to super.
-- Document the layer behavior thouroughly including call behavior though a
+- Document the layer behavior thoroughly including call behavior though a
   class level docstring. Generally methods like `build()` and `call()` should
   not have their own docstring.
+- Document the
+  [masking](https://keras.io/guides/understanding_masking_and_padding/) behavior
+  of the layer in the class level docstring as well.
 - Always include usage examples using the full symbol location in `keras_nlp`.
+- Include a reference citation if applicable.
 
 ````python
-class Linear(keras.layers.Layer):
-    """A simple WX + B linear layer.
+class PositionEmbedding(keras.layers.Layer):
+    """A layer which learns a position embedding for input sequences.
 
-    This layer contains two trainable parameters, a weight matrix and bias
-    vector. The layer will linearly transform input to an output of `units`
-    size.
+    This class accepts a single dense tensor as input, and will output a
+    learned position embedding of the same shape.
+
+    This class assumes that in the input tensor, the last dimension corresponds
+    to the features, and the dimension before the last corresponds to the
+    sequence.
+
+    This layer does not supporting masking, but can be combined with a
+    `keras.layers.Embedding` for padding mask support.
 
     Args:
-        units: The dimensionality of the output space.
+        sequence_length: The maximum length of the dynamic sequence.
 
     Examples:
 
-    Build a linear model.
+    Direct call.
+    >>> layer = keras_nlp.layers.PositionEmbedding(sequence_length=10)
+    >>> layer(tf.zeros((8, 10, 16))).shape
+    TensorShape([8, 10, 16])
+
+    Combining with a token embedding.
     ```python
-    inputs = keras.Input(shape=(2,))
-    outputs = keras_nlp.layers.Linear(4)(inputs)
-    model = keras.Model(inputs, outputs)
+    seq_length = 50
+    vocab_size = 5000
+    embed_dim = 128
+    inputs = keras.Input(shape=(seq_length,))
+    token_embeddings = keras.layers.Embedding(
+        input_dim=vocab_size, output_dim=embed_dim
+    )(inputs)
+    position_embeddings = keras_nlp.layers.PositionEmbedding(
+        sequence_length=seq_length
+    )(token_embeddings)
+    outputs = token_embeddings + position_embeddings
     ```
 
-    Call the layer directly on input.
-    >>> layer = keras_nlp.layers.Linear(4)
-    >>> layer(tf.zeros(8, 2)) == layer.b
-    True
+    Reference:
+     - [Devlin et al., 2019](https://arxiv.org/abs/1810.04805)
     """
-    def __init__(self, units=32, **kwargs):
+
+    def __init__(
+        self,
+        sequence_length,
+        **kwargs,
+    ):
         super().__init__(**kwargs)
-        self.units = units
+        self.sequence_length = int(sequence_length)
 
     def build(self, input_shape):
         super().build(input_shape)
-        self.w = self.add_weight(shape=(input_shape[-1], self.units))
-        self.b = self.add_weight(shape=(self.units,))
+        feature_size = input_shape[-1]
+        self.position_embeddings = self.add_weight(
+            "embeddings",
+            shape=[self.sequence_length, feature_size],
+        )
 
     def call(self, inputs):
-        return tf.matmul(inputs, self.w) + self.b
+        shape = tf.shape(inputs)
+        input_length = shape[-2]
+        position_embeddings = self.position_embeddings[:input_length, :]
+        return tf.broadcast_to(position_embeddings, shape)
 
     def get_config(self):
         config = super().get_config()
-        config.update({
-            "units": self.units,
-        })
+        config.update(
+            {
+                "sequence_length": self.sequence_length,
+            }
+        )
         return config
 ````
diff --git a/keras_nlp/integration_tests/basic_usage_test.py b/keras_nlp/integration_tests/basic_usage_test.py
index d3c19342f0..e9a2f85717 100644
--- a/keras_nlp/integration_tests/basic_usage_test.py
+++ b/keras_nlp/integration_tests/basic_usage_test.py
@@ -24,31 +24,31 @@ def test_quick_start(self):
 
         # Tokenize some inputs with a binary label.
         vocab = ["[UNK]", "the", "qu", "##ick", "br", "##own", "fox", "."]
-        inputs = ["The quick brown fox jumped.", "The fox slept."]
+        sentences = ["The quick brown fox jumped.", "The fox slept."]
         tokenizer = keras_nlp.tokenizers.WordPieceTokenizer(
             vocabulary=vocab,
             sequence_length=10,
         )
-        X, Y = tokenizer(inputs), tf.constant([1, 0])
+        x, y = tokenizer(sentences), tf.constant([1, 0])
 
         # Create a tiny transformer.
         inputs = keras.Input(shape=(None,), dtype="int32")
-        x = keras_nlp.layers.TokenAndPositionEmbedding(
+        outputs = keras_nlp.layers.TokenAndPositionEmbedding(
             vocabulary_size=len(vocab),
             sequence_length=10,
             embedding_dim=16,
         )(inputs)
-        x = keras_nlp.layers.TransformerEncoder(
+        outputs = keras_nlp.layers.TransformerEncoder(
             num_heads=4,
             intermediate_dim=32,
-        )(x)
-        x = keras.layers.GlobalAveragePooling1D()(x)
-        outputs = keras.layers.Dense(1, activation="sigmoid")(x)
+        )(outputs)
+        outputs = keras.layers.GlobalAveragePooling1D()(outputs)
+        outputs = keras.layers.Dense(1, activation="sigmoid")(outputs)
         model = keras.Model(inputs, outputs)
 
         # Run a single batch of gradient descent.
-        model.compile(loss="binary_crossentropy")
-        loss = model.train_on_batch(X, Y)
+        model.compile(loss="binary_crossentropy", jit_compile=True)
+        loss = model.train_on_batch(x, y)
 
         # Make sure we have a valid loss.
         self.assertGreater(loss, 0)

From fb263a91b9601b9f941f3b109ace6836e8306889 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Wed, 11 May 2022 18:26:30 -0700
Subject: [PATCH 14/15] Clarify note about dtensor

---
 ROADMAP.md | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/ROADMAP.md b/ROADMAP.md
index d9381bbfd8..49ed455dae 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -106,10 +106,13 @@ example demonstrating the component in an end-to-end architecture.
 
 By the end of 2022, we should have a actively growing collection of examples
 models, with a standardized set of training scripts, that match expected
-performance as reported in publications. On the scalability front, we should
-be running our training scripts on multi-worker GPU and TPU settings, using
-[DTensor](https://www.tensorflow.org/guide/dtensor_overview) for data parallel
-training.
+performance as reported in publications.
+
+On the scalability front, we should have at least one example demonstrating both
+data parallel and model parallel training, in a multi-worker GPU and TPU
+setting, leveraging
+[DTensor](https://www.tensorflow.org/guide/dtensor_overview) for distributed
+support.
 
 ### Tools for data preprocessing and postprocessing for end-to-end workflows
 

From c6f00524da3efa7b9241d5ac7df665c503c468a4 Mon Sep 17 00:00:00 2001
From: Matt Watson <mattdangerw@gmail.com>
Date: Mon, 16 May 2022 16:15:37 -0700
Subject: [PATCH 15/15] Strip keras.io keras-nlp section links

---
 README.md | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/README.md b/README.md
index bb693055f3..1b2b7693a1 100644
--- a/README.md
+++ b/README.md
@@ -18,12 +18,10 @@ components are first-party Keras objects that are too specialized to be
 added to core Keras, but that receive the same level of polish as the rest of
 the Keras API.
 
-You can browse our official documentation [here](https://keras.io/keras_nlp).
 We are a new and growing project, and welcome [contributions](CONTRIBUTING.md).
 
 ## Quick Links
 
-- [Documentation and Guides](https://keras.io/keras_nlp)
 - [Contributing](CONTRIBUTING.md)
 - [Roadmap](ROADMAP.md)
 - [Style Guide](STYLE_GUIDE.md)
@@ -74,14 +72,10 @@ model.compile(loss="binary_crossentropy", jit_compile=True)
 model.train_on_batch(x, y)
 ```
 
-For a complete model building tutorial, see our guide on
-[pretraining a transformer](keras.io/guides/keras_nlp/transformer_pretraining).
-
 ## Compatibility
 
 We follow [Semantic Versioning](https://semver.org/), and plan to
 provide backwards compatibility guarantees both for code and saved models built
-with our components. While we continue with pre-release `0.y.z` development, we
 may break compatibility at any time and APIs should not be consider stable.
 
 ## Citing KerasNLP