-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random Deletion Layer - Data Augmentation #152
Comments
Thanks, this looks good to me! A few things we should consider...
|
@mattdangerw Thanks for the review! For the first point yeah we could do that but I feel it might be better to have it as a separate layer which focuses on character level deletions but we could have it here as well as a parameter if that seems to match the design better For the second point yup I was also thinking about the same I'll add it in as an option Just to confirm this should work similarly to how tokenizers work right? The input could be anything ranging from a scalar to a batch of tensors right? |
I think so yeah, input could be a scalar dense tensor or a batched dense tensor. We could also consider supporting pre-split ragged inputs (in WordPiece we do this for example), if even a configurable split regex is not enough your splitting needs. Probably not something for a first version though. |
Thanks, I'll keep this at the back of my mind and maybe take this up as a next step once an initial layer is ready! |
Hey @mattdangerw |
Hey @mattdangerw |
@aflah02 yeah, definitely we need to support the bached 2D case at a minimum. Potentially this could be done with RaggedTensor and no map function? Something like... inputs = tf.constant(["this is a test", "this is another test"])
ragged_words = tf.strings.split(inputs)
mask = tf.random.uniform(ragged_words.flat_values.shape) > 0.25
mask = ragged_words.with_flat_values(mask)
deleted = tf.ragged.boolean_mask(ragged_words, mask)
deleted = tf.strings.reduce_join(deleted, axis=-1, separator=" ") Would that work? |
@mattdangerw |
@aflah02 that makes sense to me. In that case, let's call this |
@aflah02 here's a tracable set of ragged tensor ops based on the TF Text |
@mattdangerw Thanks a ton! This looks great and will help me a lot. I'll make sure to go through these ragged tensor ops for future use as well!! |
I've created this issue to specifically discuss the Random Deletion Layer while we figure out how to incorporate WordNet for Synonym Replacement
I've adapted the design mentioned by @mattdangerw here for the same
The text was updated successfully, but these errors were encountered: