Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a FNetMaskedLM task model and preprocessor #740

Merged
merged 11 commits into from
Feb 24, 2023

Conversation

apupneja
Copy link
Contributor

Solves #722

@google-cla
Copy link

google-cla bot commented Feb 10, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@apupneja
Copy link
Contributor Author

Can you please approve the CI for this PR @mattdangerw?

@mattdangerw mattdangerw self-requested a review February 16, 2023 17:16
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left some initial comments.

keras_nlp/models/f_net/f_net_tokenizer.py Outdated Show resolved Hide resolved
keras_nlp/models/f_net/f_net_masked_lm.py Show resolved Hide resolved
token_ids, segment_ids, padding_mask = (
x["token_ids"],
x["segment_ids"],
x["padding_mask"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm pretty sure f_net has no padding mask, so I think we will need to remove this

@@ -174,6 +174,7 @@ def call(self, x, y=None, sample_weight=None):
x = {
"token_ids": token_ids,
"segment_ids": segment_ids,
"padding_mask": token_ids != self.tokenizer.pad_token_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually want to do this I believe. f_net, because of the transformations it does to the sequence during it's transformer blocks, does not have a padding mask (though it does have a padding token).

cc @abheesht17 to double check this.

Copy link
Contributor Author

@apupneja apupneja Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe you're correct. I looked at the official implementation (here: function at line 57). I'll fix that.

Edit: Found a conversation on the same topic. I'll make the required changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed this. That is correct. I've confirmed this with the authors of the paper and they stated that they simply mix the entire sequence using the Fourier Transform. The model might be sensitive to examples with very different amounts of padding, but no major issues/problems were observed in practice.

@mattdangerw
Copy link
Member

Can you please approve the CI for this PR @mattdangerw?

Done! And left some initial comments.

@apupneja
Copy link
Contributor Author

I've tested this on the gist you shared in another PR.

Here's the link for it: https://colab.research.google.com/gist/apupneja/8258f796a76940394d4a510ed61d08d7/deberta-masked-lm.ipynb

@apupneja apupneja requested a review from mattdangerw February 18, 2023 04:41
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This looks great! Just a few small comments to clean up, and this needs a merge/rebase with the master branch.

Disclaimer: Pre-trained models are provided on an "as is" basis, without
warranties or conditions of any kind. The underlying model is provided by a
third party and subject to a separate license, available
[here](https://github.com/facebookresearch/fairseq).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct for this model! we can use the same disclaimer as BERT


# Creating sentencepiece tokenizer for FNet LM preprocessor
bytes_io = io.BytesIO()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove the empty newlines past this point in the code block


tokenizer = FNetTokenizer(proto=proto)

preprocessor = FNetMaskedLMPreprocessor(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format this as one line

Copy link
Contributor Author

@apupneja apupneja Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By one line, do you mean that I should create the object inside the FNetMaskedLMPreprocessor constructor parameters?

preprocessor = FNetMaskedLMPreprocessor(
tokenizer=FNetTokenizer(proto=proto)
)

Something like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I just mean what inside the parentheses here.
preprocessor = FNetMaskedLMPreprocessor(tokenizer=tokenizer)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Fixed it.

@mattdangerw
Copy link
Member

Also thanks so much for testing this our via the gist! Super helpful.

@mattdangerw
Copy link
Member

Thanks!

@mattdangerw mattdangerw merged commit 696040f into keras-team:master Feb 24, 2023
@mattdangerw mattdangerw mentioned this pull request Apr 18, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants