Greedy text generation util #154

chenmoneygithub · 2022-04-29T21:07:19Z

#108

mattdangerw

Thanks! Left some initial comments mainly on design front.

keras_nlp/utils/text_generation.py

saberkun · 2022-05-05T09:51:12Z

keras_nlp/utils/text_generation.py

+                next_token, end_token_received, end_token_id
+            )
+        # Append the next token to current sequence.
+        input_ids = tf.concat([input_ids, next_token[:, tf.newaxis]], axis=-1)


How about testing with XLA GPU at least if you cannot test on TPU :)?

That's a good point.

I am not sure how to add github GPU/TPU test. It also reminds that we might want to test distributed training (not for this utility but other modules) in the future. We will check more on it.

IIUC XLA on GPU should be turned on manually? It requires @tf.function(jit_compile=True). For this specific utility, we probably don't need XLA testing?

I am not sure if you realize that the tf.concat will yield dynamic shape and if you know xla requirements.

Interesting - so tf.concat is something that cannot work on TPU and GPU/CPU (XLA)?

Our current plan is not to wrap this utility with tf.function(), but users can choose to wrap next_token_fn by tf.function(), as next_token_fn takes most of the computation time.

the performance will be bad or you only target this as a demo util function? Is it the reason you use python while loop but not tf.while_loop for the decoding loop?

Got it, yes, this util is mainly useful for a demo like colab guide, so performance is not the focus now.

I am actually curious, how does model garden handle the token concatenation? Are you using a fixed size tensor, and change the value at each iteration? I am not sure how much performance diff it would have, since the bottleneck is mostly at model calling when I did benchmark on colab.

We need to allocate buffer to the max sequence length to decode and use in-place update.
The padded_decode path is dedicated to XLA: https://github.com/tensorflow/models/blob/master/official/nlp/modeling/ops/beam_search.py#L109
BTW, I read fairseq code before and I guess they use something similar for GPU performance optimization as well.

Thanks Hongkun! I will open an issue to track the refactoring,

Yeah this isn't just not XLA compilable, it is not at all tf.function compilable right now.

We definitely should look at making it so, both for usability (use this inside a keras model) and performance for any sort of bulk inference like job.

For now, this has been more to get the API signature how we want it. I wouldn't say this should always be only a demo util function, that's just where we are at today.

saberkun · 2022-05-05T09:52:06Z

keras_nlp/utils/text_generation_test.py

+from keras_nlp.utils.text_generation import generate_text_greedy
+
+
+class TextGenerationTest(tf.test.TestCase):


interesting, do you need tf.test.main()?

Seems no - our test is based on pytest, which automatically captures the test cases.

mattdangerw

Left some more comments. But the big one, I couldn't figure out a way to actually do text generation with a plain text seed text and our tokenizers. Take a look:

https://colab.sandbox.google.com/gist/mattdangerw/34ec3d54511d74f6a9b2ca4bdb9e22b8/text-generation-scratch.ipynb

I think if we want to support batches of input, we need to support batches of raggeds, otherwise I'm at a loss as to how this could be used for batched text generation where the seed input is plain text.

keras_nlp/utils/text_generation.py

mattdangerw · 2022-05-10T17:13:01Z

keras_nlp/utils/text_generation.py

+    ```
+
+    """
+    if 0 in input_ids.shape:


Looks at this more, and I really think we should not keep this check. Seems fully possible to support a empty input tensor input_ids=[] or a batch shape with input_ids=tf.zeros([bs, 0]).

The former would be really useful in guides when doing something really simple. There will not always be start tokens as a convention, see the main tf text generation guide as an example https://www.tensorflow.org/text/tutorials/text_generation.

We should also call tf.convert_to_tensor on non tensor input, so we can do things like input_ids=[] or input_ids=[start_id].

One thing I feel confused is if there is no prompt at all, how shall we generate the next token? Maybe add an extra argument start_token?

fchollet · 2022-05-10T21:18:16Z

keras_nlp/utils/text_generation.py

+    Args:
+        token_probability_fn: a callable, which takes in input_sequence
+            and output the probability distribution of the next token.
+        input_ids: a list, the initial tokens to append generated tokens.


fchollet · 2022-05-10T21:19:05Z

keras_nlp/utils/text_generation.py

+            sequence. If None, every sequence is generated up to `max_length`.
+
+    Returns:
+        A 1D int Tensor, or 2D int RaggedTensor representing the generated


Most likely it should operate on a single sequence, never a batch. The user could map it to a batch.

Processing at single sequence level would make the code simpler, but the execution would be slowed down: there are lots of model calling inside this utility, so without parallelism it could take much longer.

fchollet · 2022-05-10T21:19:55Z

keras_nlp/utils/text_generation.py

+    return filtered_next_token, end_token_received
+
+
+def generate_text_greedy(


This operates in integer token space, not text space. It should not be named "generate text" (also "generate text greedy" is not a sentence, unlike "greedy text generation" or "generate text greedily").

Maybe this should just be called greedy_search (vs beam_search)

fchollet · 2022-05-10T21:20:43Z

keras_nlp/utils/text_generation.py

+            and output the probability distribution of the next token.
+        input_ids: a list, the initial tokens to append generated tokens.
+        max_length: int. The max length of generated text.
+        end_token_id: int, defaults to None. The token marking the end of the


end_token?

We haven't really formalized this (I should add it to our API design guide). But I think if this operates strictly on ints end_token_id is the correct name. So people don't pass "<eos>" by mistake.

If we switched this to strings end_token would be the correct name. But it sounds like we will keep this in int space for now, and just rename to beam_search and greedy_search, which I think is a good call.

chenmoneygithub · 2022-05-13T21:36:56Z

Colab example for text generation: https://colab.research.google.com/gist/chenmoneygithub/002633be87e440248870b43089f47530/kerasnlp-text-generation-model-util.ipynb

dataset: mini Shakespeare

chenmoneygithub · 2022-05-14T23:25:43Z

Colab example for machine translation:
https://colab.research.google.com/gist/chenmoneygithub/b1301804f8cfcb5347ade8c7b6cb124f/kerasnlp-machine-translation-model-util.ipynb

dataset: http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip

The accuracy is bad, but the main goal of this colab is to show how to use the text generation util.

mattdangerw · 2022-05-16T20:06:00Z

Thanks @chenmoneygithub!

In the first example, is the manual import compute_causal_mask + passing to encoder the best way to do a decoder only architecture? If so we should fix, that does not feel like a clear and clean example. Not for this PR, but let's discuss this.
In the second example, is there a reason you aren't using end token id? When it is supplied what will the output look like? End token id followed by pad token ids? We need to make this clear in the documentation. Also, should we take in pad_token_id as an arg? It looks like you are doing some implicit assumptions about the zero index.
The second example gives and example of why I don't think the prompt should be required to have have non-zero length on the last dimension. You could easily just not do the convention of starting with "[START]", and it would be totally valid to allow sampling starting from the first token. This would also be true of a model that pads with a start token within the call graph. Here's modifications on your colab. Allowing zero shape on the last dim would be more flexible without breaking any of the cases today.

I'll play around with compiling this into functions and models. Curious what our current standing is there.

chenmoneygithub · 2022-05-16T21:45:18Z

Yes, created TransformerDecoder should support single inputs #182 for fix.
Updated the PR to include a pad_token_id.
Removed the requirement that prompt must be given.

mattdangerw

Left a few more comments. This is looking closer. We can simplify the end sequence token logic, and we need to do something user friendly in the compiled function case.

keras_nlp/utils/text_generation.py

mattdangerw · 2022-05-17T00:48:09Z

keras_nlp/utils/text_generation.py

+        prompt = tf.concat([prompt, next_token[:, tf.newaxis]], axis=-1)
+        return get_subsequent_tokens(prompt, end_token_id_received)
+
+    generated_sequence = get_subsequent_tokens(prompt, end_token_id_received)


I think the end token logic could be much more readable (and probably more efficient) as a post process. tf.sequence_mask could help.

Something like

if end_token_id is not None: # Find index of first end_token_id. end_indices = tf.math.argmax(outputs == end_token_id, -1) # Use max_length if none found. end_indices = tf.where(end_indices == 0, max_length, end_indices) # Build a mask including end_token and replace overflow with pad_token_id. valid_indices = tf.sequence_mask(end_indices + 1, maxlen=max_length) outputs = tf.where(valid_indices, outputs, pad_token_id)

good point! done

mattdangerw

Thanks! Just minor edits now.

mattdangerw · 2022-05-17T21:39:20Z

keras_nlp/utils/text_generation.py

+    prompt = tf.random.uniform(shape=[5, 5], maxval=VOCAB_SIZE, dtype=tf.int64)
+
+    # Print the generated sequence (token ids).
+    keras_nlp.greedy_search(


keras_nlp.utils.greedy_search right?

good catch!

keras_nlp/utils/text_generation.py

keras_nlp/utils/text_generation_test.py

saberkun · 2022-05-18T02:26:17Z

Can prompt be a prefix list of tokens? context: prefix-LM

mattdangerw

Approving last few comments.

mattdangerw · 2022-05-18T19:51:04Z

keras_nlp/utils/text_generation_test.py

+            end_token_id=2,
+            pad_token_id=0,
+        )
+        self.assertAllEqual(outputs[0, 2:], tf.repeat(3, max_length - 2))


Just test the whole output here, that will be much more readable.

keras_nlp/utils/text_generation_test.py

keras_nlp/utils/text_generation.py

chenmoneygithub force-pushed the text-generation branch from 3347fd1 to 86a6dd3 Compare April 29, 2022 21:08

chenmoneygithub changed the title ~~initial commit for greedy text generation util~~ Greedy text generation util Apr 29, 2022

mattdangerw reviewed May 3, 2022

View reviewed changes

chenmoneygithub force-pushed the text-generation branch 2 times, most recently from e0b878a to f2b503a Compare May 4, 2022 21:15

mattdangerw requested changes May 4, 2022

View reviewed changes

keras_nlp/utils/text_generation.py Outdated Show resolved Hide resolved

keras_nlp/utils/text_generation.py Outdated Show resolved Hide resolved

keras_nlp/utils/text_generation.py Show resolved Hide resolved

saberkun reviewed May 5, 2022

View reviewed changes

mattdangerw requested changes May 10, 2022

View reviewed changes

fchollet reviewed May 10, 2022

View reviewed changes

chenmoneygithub added 3 commits May 11, 2022 12:00

initial commit for greedy text generation util

2f70695

fix comments

6e092f6

some changes

77d4e36

chenmoneygithub force-pushed the text-generation branch 2 times, most recently from 29dc810 to 9cc5a60 Compare May 11, 2022 19:54

address comments

85de395

chenmoneygithub force-pushed the text-generation branch from 9cc5a60 to 85de395 Compare May 11, 2022 19:55

Add a pad_token_id arg

edfb09a

mattdangerw requested changes May 17, 2022

View reviewed changes

chenmoneygithub added 2 commits May 17, 2022 14:13

Change the logic of masking out tokens after end_token_id

637dbb3

Error out RaggedTensor prompt and convert list prompt to tensor

5ded039

mattdangerw requested changes May 18, 2022

View reviewed changes

fix comments

161d704

mattdangerw approved these changes May 18, 2022

View reviewed changes

small fix

e61b8ac

chenmoneygithub force-pushed the text-generation branch from e43f4da to e61b8ac Compare May 18, 2022 20:43

mattdangerw mentioned this pull request May 19, 2022

Add a byte or character level seq to seq example on keras.io #191

Closed

chenmoneygithub merged commit e3addda into keras-team:master May 19, 2022

chenmoneygithub deleted the text-generation branch November 30, 2022 21:12

		from keras_nlp.utils.text_generation import generate_text_greedy


		class TextGenerationTest(tf.test.TestCase):

		return filtered_next_token, end_token_received


		def generate_text_greedy(

Greedy text generation util #154

Greedy text generation util #154

Conversation

chenmoneygithub commented Apr 29, 2022

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw May 11, 2022 • edited Loading

Choose a reason for hiding this comment

chenmoneygithub commented May 13, 2022

chenmoneygithub commented May 14, 2022

mattdangerw commented May 16, 2022

chenmoneygithub commented May 16, 2022

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw May 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saberkun commented May 18, 2022

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw May 11, 2022 •

edited

Loading

mattdangerw May 17, 2022 •

edited

Loading