-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent preprocessing output on all backends #1777
Merged
mattdangerw
merged 3 commits into
keras-team:master
from
mattdangerw:consistent-preprocessing-outputs
Aug 19, 2024
Merged
Consistent preprocessing output on all backends #1777
mattdangerw
merged 3 commits into
keras-team:master
from
mattdangerw:consistent-preprocessing-outputs
Aug 19, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mattdangerw
force-pushed
the
consistent-preprocessing-outputs
branch
7 times, most recently
from
August 16, 2024 03:04
2008b84
to
5de16cd
Compare
mattdangerw
changed the title
[DRAFT] Consistent preprocessing output on all backends
Consistent preprocessing output on all backends
Aug 16, 2024
Old behavior: - On TF backend, raggeds and strings returned as tf tensors. - On Jax/Torch bakcnes, raggeds and strings returned as lists. - Preprocessing functions outside of `call`, like `tokenize()`, `detokenize()`, `generate_preprocess()`, will always return tf tensors. This made it hard to write backend agnostic code. TF shows up in random places, and if you are flipping from tf -> jax or vice versa you have to switch between handling tensors and lists. New behavior: - On all backends for all functions, raggeds and strings are returned as lists. - Inside a `tf.data` call or tf compiled function, preprocessing layers always output tf.tensors. This requires a little complexity to avoid over converting back and forth for tf -> python, but thankfully we can hide most of that complexity in a decorator.
mattdangerw
force-pushed
the
consistent-preprocessing-outputs
branch
from
August 16, 2024 03:13
5de16cd
to
5fdaa4a
Compare
mattdangerw
force-pushed
the
consistent-preprocessing-outputs
branch
from
August 16, 2024 20:56
c582212
to
331f6a1
Compare
SamanehSaadat
approved these changes
Aug 16, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks, Matt! Just left a couple of nit comments!
pkgoogle
pushed a commit
to pkgoogle/keras-hub
that referenced
this pull request
Aug 22, 2024
* Consistent preprocessing output on all backends Old behavior: - On TF backend, raggeds and strings returned as tf tensors. - On Jax/Torch bakcnes, raggeds and strings returned as lists. - Preprocessing functions outside of `call`, like `tokenize()`, `detokenize()`, `generate_preprocess()`, will always return tf tensors. This made it hard to write backend agnostic code. TF shows up in random places, and if you are flipping from tf -> jax or vice versa you have to switch between handling tensors and lists. New behavior: - On all backends for all functions, raggeds and strings are returned as lists. - Inside a `tf.data` call or tf compiled function, preprocessing layers always output tf.tensors. This requires a little complexity to avoid over converting back and forth for tf -> python, but thankfully we can hide most of that complexity in a decorator. * Rename preprocessing_function -> tf_preprocessing_function * address comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Old behavior:
__call__
, liketokenize()
,detokenize()
,generate_preprocess()
, will always return tf tensors on all backends.This made it hard to write backend agnostic code. TF shows up in random places, and if you are flipping from tf -> jax or vice versa you have to switch between handling tensors and lists.
New behavior:
tf.data
call or tf compiled function, preprocessing layers always output tf.tensors.This requires a little complexity to avoid over converting back and forth from tf -> python in nested calls, but thankfully we can hide most of that complexity in a decorator.