Implement TopP, TopK and Beam samplers #652

chenmoneygithub · 2023-01-13T01:16:28Z

A few things covered:

Implement TopP, TopK and Beam samplers
Add "Sampler" suffix to our sampler class names.
Add run_eagerly option to our sampler to align with model.compile().

One special thing to note:

In Beam sampler, there is a strange error on batch_size=1 case, which is after the first iteration, the shape of beam_probs changes from [1, None] to [None, None], so we add shape_invariants specifically for beam sampler.

jbischof

Thanks @chenmoneygithub looking great! The biggest thing is unifying our docstring style

keras_nlp/samplers/beam_sampler.py

jbischof · 2023-01-14T00:20:05Z

keras_nlp/samplers/beam_sampler.py

+
+    Examples:
+    ```python
+    BATCH_SIZE = 8


@chenmoneygithub I really don't think these arg blocks match the rest of the library's style. I don't think there's a perfect answer but I'd prefer it not to be obvious who wrote what 🥗

Thanks! I checked our code base, it appears we have a few code having this style:

maskedLmhead: link

PositionEmbedding: link

SinePositionEmbedding: link

Generation utils: link

I feel like giving these numbers an explicit meaning makes the code longer but kinda easier to understand.

It's a totally valid opinion @chenmoneygithub but it's also important for the code to have a unified style rather than each contributor producing different looking code. If there's an example that's unclear without named params I'm open to trying something different but otherwise I'm hoping we can compromise!

Taking steps towards a more unified style (and then reflecting that in our style guide) sgtm. What are the main places this differs, besides the constants at the top?

Basically what I asked for in #658 was the current thought. A bit less script-like and a little more "drop this line in colab and see what we're talking about".

I could see us replacing the "model" with something like tf.random.uniform(shape, minval=-1, maxval=1). It is kind of weird to me that we show a whole model, that is trainable but randomly initialized (so results will be random anyway), and not even sequence aware so would never really perform even if your trained it. For a new user this seems a bit of a red herring.

Would be more concise to do something like:

def token_probability_fn(inputs, mask): return tf.random.uniform(...) # Replace with a real model!

I would like to keep the model part so that the example is closer to real use cases.

I will unify the docstring to move those hypers inline.

To me the model falls into an "uncanny valley" of code examples. It's not something that will actually work, yet also not clearly a random dummy data. As a newbie I worry I would not understand, first, that results will be random, and second, that this is a "bad model" for the task.

Fine to merge as is, but I hope we can play around with some improvements down the road.

jbischof · 2023-01-14T00:24:51Z

keras_nlp/samplers/sampler.py

    ):
+        if run_eagerly and jit_compile:


Do we need two flags or just one then? What happens if they are both False---what is a "non-XLA" graph? I'm confused!

This is following the style of model.compile(): code link

non-XLA graph is quite common, anything annotated with tf.function without jit_compile=True is a non-xla graph.

Yes, agree, was just trying to understand the difference

mattdangerw

Thanks Chen! Left some comments and a few questions.

keras_nlp/samplers/__init__.py

mattdangerw · 2023-01-18T02:29:24Z

keras_nlp/samplers/beam_sampler.py

+            beams, max_indexes[:, tf.newaxis], axis=1, batch_dims=1
+        )
+
+        prompt = tf.squeeze(max_beams, axis=1)


return immediately

keras_nlp/samplers/top_p_sampler.py

mattdangerw · 2023-01-18T02:45:54Z

keras_nlp/samplers/top_p_sampler.py

+            )
+            if from_logits:
+                pred = keras.activations.softmax(pred, axis=-1)
+            # Sort preds in descending order.


The main change I could see us making if we want to, is splitting the overload points into sample and sample_step (I think some discussion of this on other PR?).

Overload sample if you want to control the whole process (e.g. beam search). Overload sample_step if you want to control simply going from one probability distribution to one sample. E.g. the body of this class could look like:

def sample_step(preds): sorted_preds, sorted_indices = tf.math.top_k( pred, k=tf.shape(preds)[1], sorted=True ) # Calculate cumulative probability distribution. cumulative_probs = tf.math.cumsum(sorted_preds, axis=-1) # Create a mask for the tokens to keep. keep_mask = cumulative_probs <= self.p # Shift to include the last token that exceed p. shifted_keep_mask = tf.concat( [tf.ones_like(keep_mask[:, :1]), keep_mask[:, :-1]], axis=-1 ) # Filter out unmasked tokens and sample from filtered distribution. probs = tf.where( shifted_keep_mask, sorted_preds, tf.zeros(tf.shape(pred), dtype=sorted_preds.dtype), ) sorted_next_token = tf.random.categorical( tf.math.log(probs), 1, seed=self.seed ) return tf.gather_nd( sorted_indices, sorted_next_token, batch_dims=1 )

This might improve the readability of our simple samples, while still keeping full extensibility.

As long as we can make BeamSampler an outlier, I am down to this refactoring!

Actually we can move one step further to only leave get_next_token open, which takes in a prob distribution and returns the next token, and the rest updating logic can be shared across those samplers.

Reflected this change in the PR.

Yes, this is what I was hoping for! 🚀

keras_nlp/samplers/top_p_sampler.py

mattdangerw · 2023-01-18T02:57:12Z

keras_nlp/samplers/beam_sampler.py

+
+    Examples:
+    ```python
+    BATCH_SIZE = 8


I could see us replacing the "model" with something like tf.random.uniform(shape, minval=-1, maxval=1). It is kind of weird to me that we show a whole model, that is trainable but randomly initialized (so results will be random anyway), and not even sequence aware so would never really perform even if your trained it. For a new user this seems a bit of a red herring.

Would be more concise to do something like:

def token_probability_fn(inputs, mask): return tf.random.uniform(...) # Replace with a real model!

jbischof

Just some nits. Looking good!

jbischof · 2023-01-18T19:33:26Z

keras_nlp/samplers/greedy_sampler.py

+
+    Examples:
+    ```python
+    VOCAB_SIZE = 10


Let's inline this arg as well. Or am I missing something?

This one is used twice - embedding and dense, and the arg name does not suggest "vocab_size", so I am keeping this one for clarify.

jbischof · 2023-01-18T19:40:38Z

keras_nlp/samplers/sampler.py

@@ -252,20 +247,103 @@ def __call__(

        return tf.squeeze(prompt, axis=0) if input_is_1d else prompt

-    @format_docstring(sample_args=sample_args_docstring)
+    def get_next_token(self, next_token_probs):


Do you think sliding window can fit into this paradigm?

yea, it should work

jbischof · 2023-01-18T19:47:53Z

keras_nlp/samplers/greedy_sampler.py

+
+    sampler = keras_nlp.samplers.GreedySampler()
+    # Print the generated sequence (token ids).
+    print(sampler(prompt, token_probability_fn, 10))


For clarity you could use named args:

print(sampler(prompt, token_probability_fn, max_length=10))

This matches or perhaps enhances the readability of using globals.

good call, done

mattdangerw

Left a few more comments re get_config. +approval, as I am fine to merge after changes to get something landed. But hope we will stay open to changes here as we dig more into the generative case!

keras_nlp/samplers/top_k_sampler.py

keras_nlp/samplers/top_p_sampler.py

keras_nlp/samplers/top_k_sampler.py

keras_nlp/samplers/beam_sampler.py

jbischof · 2023-01-18T20:09:31Z

Wait for @fchollet, he wants to take a look and can make the final call on class names.

jbischof · 2023-01-18T20:17:47Z

Short class names seem to fit in the Keras ecosystem:

sampler=keras_nlp.samplers.Beam(num_beams=3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=keras.metrics.SparseCategoricalAccuracy(),
kernel_initializer=initializers.RandomNormal(stddev=0.01),
activation=activations.relu,

chenmoneygithub · 2023-01-18T20:48:53Z

Let's merge this one to unblock the TFLite sprint, I will sync with Francois offline regarding the string identifier.

fchollet · 2023-01-18T23:30:39Z

keras_nlp/samplers/beam_sampler.py

+
+    def get_next_token(self, next_token_probs):
+        # Beam search overrides the whole `sample` method.
+        pass


This should raise an error I suppose?

chenmoneygithub added 11 commits January 9, 2023 17:20

initial commit

26fd509

Add keras_nlp.samplers

7e4c651

Change padding to left to right

28bcfe1

Add serialization support, and move some args from constructor to call

9757f4d

Add string example

f7508cb

small changes

b658b61

Address comments: fix docstring, remove multicase support

76c430c

Address comments: move token_probability_fn to the second place

bb430dd

some initials

afd3082

Merge branch 'master' into text-generation-extend

273a6a5

add more sampler class, and a few changes on the base sampler class

31ad970

chenmoneygithub requested review from jbischof and mattdangerw January 13, 2023 01:16

add some arg defaults

de2ac9c

jbischof suggested changes Jan 14, 2023

View reviewed changes

mattdangerw requested changes Jan 18, 2023

View reviewed changes

chenmoneygithub force-pushed the text-generation-extend branch from ce07a4d to 64ff158 Compare January 18, 2023 18:45

jbischof approved these changes Jan 18, 2023

View reviewed changes

jbischof requested a review from fchollet January 18, 2023 19:54

refactor the interface

950ad43

chenmoneygithub force-pushed the text-generation-extend branch from 64ff158 to 950ad43 Compare January 18, 2023 20:07

mattdangerw approved these changes Jan 18, 2023

View reviewed changes

keras_nlp/samplers/top_k_sampler.py Show resolved Hide resolved

keras_nlp/samplers/top_p_sampler.py Show resolved Hide resolved

keras_nlp/samplers/top_k_sampler.py Show resolved Hide resolved

keras_nlp/samplers/beam_sampler.py Show resolved Hide resolved

add get_config

631afe4

chenmoneygithub merged commit 612bbf5 into keras-team:master Jan 18, 2023

chenmoneygithub removed the request for review from fchollet January 18, 2023 20:49

fchollet reviewed Jan 18, 2023

View reviewed changes

chenmoneygithub deleted the text-generation-extend branch February 2, 2023 00:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement TopP, TopK and Beam samplers #652

Implement TopP, TopK and Beam samplers #652

chenmoneygithub commented Jan 13, 2023

jbischof left a comment

jbischof Jan 14, 2023

chenmoneygithub Jan 17, 2023

jbischof Jan 17, 2023

mattdangerw Jan 17, 2023

jbischof Jan 18, 2023

mattdangerw Jan 18, 2023

chenmoneygithub Jan 18, 2023

mattdangerw Jan 18, 2023

jbischof Jan 14, 2023

chenmoneygithub Jan 17, 2023

jbischof Jan 17, 2023

mattdangerw left a comment

mattdangerw Jan 18, 2023

mattdangerw Jan 18, 2023

chenmoneygithub Jan 18, 2023

jbischof Jan 18, 2023

mattdangerw Jan 18, 2023

jbischof left a comment

jbischof Jan 18, 2023

chenmoneygithub Jan 18, 2023

jbischof Jan 18, 2023

chenmoneygithub Jan 18, 2023

jbischof Jan 18, 2023

chenmoneygithub Jan 18, 2023

mattdangerw left a comment

jbischof commented Jan 18, 2023

jbischof commented Jan 18, 2023 •

edited

Loading

chenmoneygithub commented Jan 18, 2023

fchollet Jan 18, 2023

Implement TopP, TopK and Beam samplers #652

Implement TopP, TopK and Beam samplers #652

Conversation

chenmoneygithub commented Jan 13, 2023

jbischof left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbischof left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

jbischof commented Jan 18, 2023

jbischof commented Jan 18, 2023 • edited Loading

chenmoneygithub commented Jan 18, 2023

Choose a reason for hiding this comment

jbischof commented Jan 18, 2023 •

edited

Loading