-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standalone functions for generate pre/post processing for GPT-2 #998
Standalone functions for generate pre/post processing for GPT-2 #998
Conversation
/gcbrun |
786ad50
to
86fc49d
Compare
/gcbrun |
86fc49d
to
ef2464a
Compare
/gcbrun |
Thanks for the PR! Overall looks good! Took a pass over the code, and I have a few high-level comments:
Generally I think we need to make it easy for developers/users understand our breakdown, and why we expose 3 public methods in |
I can work on the documentation. In my mind this is an advanced workflow. We do not advertise this at all in the first go around.
You can only do this if you pass the input padding mask to the output postprocess function right? Otherwise you lose track of what end tokens came from the original sequence. Let's think it terms of the "export flow". We cannot export def export_fn(x):
x = gpt2_preprocessor.generate_preprocess(x)
x = gpt2_lm.make_generate_function()(x)
return gpt2_preprocessor.generate_postprocess(x) With the change you are proposing, it needs to look like this... def export_fn(x):
preprocessed = gpt2_preprocessor.generate_preprocess(x)
input_padding_mask = preprocessed["padding_mask"]
output_token_ids = gpt2_lm.make_generate_function()(preprocessed)
return gpt2_preprocessor.generate_postprocess(
output_token_ids,
padding_mask=input_padding_mask,
) The state gets a little more complex right? Postprocessing depends on both the partial output of preprocessing and the output of the generation function. I also think there is a UX argument to return a padding mask in the "no preprocess" workflow. When you do, you pass in two tensors in a dict with shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New abstractions look very clean, thank you!
Some of the submethods could benefit from more documentation or (ideally) be more self-documenting so that the users can understand what these abstractions do and when they would use them.
`keras_nlp.models.GPT2CausalLM`. By default, it will take in batches of | ||
strings, and return outputs in a `(x, y, sample_weight)` format, where the | ||
`y` label is the next token id in the `x` sequence. For use with generation, | ||
pass `return_labels=False` in which case the output will simply be the | ||
encoded string features. | ||
the layer also exposes two methods `generate_preprocess()` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we tell users when/why to use them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a bit more color here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM! Per our offline discussion, some more comments on normalize_input
and so could help users understand.
Thanks for review! Added some more docs. |
ef2464a
to
4c0690c
Compare
/gcbrun |
4c0690c
to
418d1db
Compare
/gcbrun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very impressive work @mattdangerw!
/gcbrun |
This decomposes generate in the way we discussed last week, with the goal of leaving the top-level functionality untouched, but allowing a more a granular way to access the preprocessing, postprocessing, and inner dense generation function. Colab [HERE](https://colab.research.google.com/gist/mattdangerw/bb1ef01c1b67255def4a6ad9429de2df/split-preprocessing-demo.ipynb) Other than moving things around in the refactor, there is one major change we need to do here, which is the inner, compiled generate function must also return a padding mask of token ids that were updated. Without this padding mask, the postprocessor would not know where to truncate output before detokenization. To accommodate this I made `generate_function` inputs and outputs a dict with keys "token_ids" and "padding_mask". I actually find this fairly intuitive, with this change `generate_function` has the same inputs and outputs as directly calling the model! ```python generate_function = causal_lm.make_generate_function() generate_function({ "token_ids": [[1, 2, 3, 4, 0, 0, 0, 0]], "padding_mask": [[1, 1, 1, 1, 0, 0, 0, 0]], }) >>> { "token_ids": [[1, 2, 3, 4, 5, 6, 7, 8]], "padding_mask": [[1, 1, 1, 1, 1, 1, 1, 1]], } generate_function({ "token_ids": [[1, 2, 3, 4, 0, 0, 0, 0]], "padding_mask": [[1, 1, 1, 1, 0, 0, 0, 0]], }, end_token_id=6) >>> { "token_ids": [[1, 2, 3, 4, 5, 6, 0, 0]], "padding_mask": [[1, 1, 1, 1, 1, 1, 0, 0]], } ```
d6e5d97
to
79d5bab
Compare
/gcbrun |
/gcbrun |
* Standalone functions for generate pre/post processing This decomposes generate in the way we discussed last week, with the goal of leaving the top-level functionality untouched, but allowing a more a granular way to access the preprocessing, postprocessing, and inner dense generation function. Colab [HERE](https://colab.research.google.com/gist/mattdangerw/bb1ef01c1b67255def4a6ad9429de2df/split-preprocessing-demo.ipynb) Other than moving things around in the refactor, there is one major change we need to do here, which is the inner, compiled generate function must also return a padding mask of token ids that were updated. Without this padding mask, the postprocessor would not know where to truncate output before detokenization. To accommodate this I made `generate_function` inputs and outputs a dict with keys "token_ids" and "padding_mask". I actually find this fairly intuitive, with this change `generate_function` has the same inputs and outputs as directly calling the model! ```python generate_function = causal_lm.make_generate_function() generate_function({ "token_ids": [[1, 2, 3, 4, 0, 0, 0, 0]], "padding_mask": [[1, 1, 1, 1, 0, 0, 0, 0]], }) >>> { "token_ids": [[1, 2, 3, 4, 5, 6, 7, 8]], "padding_mask": [[1, 1, 1, 1, 1, 1, 1, 1]], } generate_function({ "token_ids": [[1, 2, 3, 4, 0, 0, 0, 0]], "padding_mask": [[1, 1, 1, 1, 0, 0, 0, 0]], }, end_token_id=6) >>> { "token_ids": [[1, 2, 3, 4, 5, 6, 0, 0]], "padding_mask": [[1, 1, 1, 1, 1, 1, 0, 0]], } ``` * More docstring updates * Fix merge conflict
Hi @mattdangerw, the tf lite conversation issue I ran into in #1090 seems related to the Do you think they are related? If so, do you have any suggestions on what tensor spec to use when converting it to |
This decomposes generate in the way we discussed last week, with the goal of leaving the top-level functionality untouched, but allowing a more a granular way to access the preprocessing, postprocessing, and compiled generation function. Colab
HERE
Other than moving things around in the refactor, there is one major change we need to do here, which is the inner, compiled generate function must also return a padding mask of token ids that were updated. Without this padding mask, the postprocessor would not know where to truncate output before detokenization.
To accommodate this I made
generate_function
inputs and outputs a dict with keys"token_ids"
and"padding_mask"
. I actually find this fairly intuitive, with this changegenerate_function
has the same inputs and outputs as directly calling the model!