Skip to content

Commit

Permalink
Mlm mask generator docstring adding example (#916)
Browse files Browse the repository at this point in the history
* Add example to MLMMaskGenerator to show best Practice

* Add example to MLMMaskGenerator to show best Practice: format the code

* Add example to masked language model generater: Edit

* Add Bert masked language model to the example

* Add simple example: masking a batch that contains special tokens

* Format the code
  • Loading branch information
abuelnasr0 authored Mar 31, 2023
1 parent 794c4b1 commit 1795d10
Showing 1 changed file with 37 additions and 11 deletions.
48 changes: 37 additions & 11 deletions keras_nlp/layers/masked_lm_mask_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
class MaskedLMMaskGenerator(keras.layers.Layer):
"""Layer that applies language model masking.
This layer is useful for preparing inputs for masked languaged modeling
This layer is useful for preparing inputs for masked language modeling
(MaskedLM) tasks. It follows the masking strategy described in the [original BERT
paper](https://arxiv.org/abs/1810.04805). Given tokenized text,
it randomly selects certain number of tokens for masking. Then for each
Expand Down Expand Up @@ -81,16 +81,42 @@ class MaskedLMMaskGenerator(keras.layers.Layer):
Examples:
Basic usage.
>>> masker = keras_nlp.layers.MaskedLMMaskGenerator(
... vocabulary_size=10, mask_selection_rate=0.2, mask_token_id=0,
... mask_selection_length=5)
>>> masker(tf.constant([1, 2, 3, 4, 5]))
Ragged Input:
>>> masker = keras_nlp.layers.MaskedLMMaskGenerator(
... vocabulary_size=10, mask_selection_rate=0.5, mask_token_id=0,
... mask_selection_length=5)
>>> masker(tf.ragged.constant([[1, 2], [1, 2, 3, 4]]))
```python
masker = keras_nlp.layers.MaskedLMMaskGenerator(
vocabulary_size=10,
mask_selection_rate=0.2,
mask_token_id=0,
mask_selection_length=5
)
# Dense input.
masker(tf.constant([1, 2, 3, 4, 5]))
# Ragged input.
masker(tf.ragged.constant([[1, 2], [1, 2, 3, 4]]))
```
Masking a batch that contains special tokens.
```python
pad_id, cls_id, sep_id, mask_id = 0, 1, 2, 3
batch = tf.constant([
[cls_id, 4, 5, 6, sep_id, 7, 8, sep_id, pad_id, pad_id],
[cls_id, 4, 5, sep_id, 6, 7, 8, 9, sep_id, pad_id],
])
masker = keras_nlp.layers.MaskedLMMaskGenerator(
vocabulary_size = 10,
mask_selection_rate = 0.2,
mask_selection_length = 5,
mask_token_id = mask_id,
unselectable_token_ids = [
cls_id,
sep_id,
pad_id,
]
)
masker(batch)
```
"""

def __init__(
Expand Down

0 comments on commit 1795d10

Please sign in to comment.