-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an add_prefix_space Arg in BytePairTokenizer #715
Conversation
@@ -87,7 +87,7 @@ def remove_strings_from_inputs(tensor, string_to_remove): | |||
return result | |||
|
|||
|
|||
def split_strings_for_bpe(inputs): | |||
def split_strings_for_bpe(inputs, add_prefix_space): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work 🤔? We are passing this argument in this function, but not actually doing anything with it inside the function.
I was thinking of doing something along these lines in the tokenize()
function:
def tokenize(self, inputs):
if not isinstance(inputs, (tf.Tensor, tf.RaggedTensor)):
inputs = tf.convert_to_tensor(inputs)
if self.add_prefix_space:
inputs = tf.strings.join([" ", inputs])
...
Trial:
>>> import tensorflow as tf
>>> inputs = tf.constant(["add space at the beginning of every string", "we can use tf.strings.join(...)"])
>>> inputs
<tf.Tensor: shape=(2,), dtype=string, numpy=
array([b' add space at the beginning of every string',
b' we can use tf.strings.join(...)'], dtype=object)>
CC: @mattdangerw
@shivance, please add a unit test as well |
Thanks @abheesht17 , just found #708 |
cc: @jbischof |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Also thanks Abi for the careful review!
Yes , definitely Thanks @abheesht17 !! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! One minor comment on the docstring.
Thank you!! |
Closes #436
cc : @abheesht17