-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a FNetMaskedLM task model and preprocessor #740
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Can you please approve the CI for this PR @mattdangerw? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Left some initial comments.
token_ids, segment_ids, padding_mask = ( | ||
x["token_ids"], | ||
x["segment_ids"], | ||
x["padding_mask"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm pretty sure f_net has no padding mask, so I think we will need to remove this
@@ -174,6 +174,7 @@ def call(self, x, y=None, sample_weight=None): | |||
x = { | |||
"token_ids": token_ids, | |||
"segment_ids": segment_ids, | |||
"padding_mask": token_ids != self.tokenizer.pad_token_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't actually want to do this I believe. f_net, because of the transformations it does to the sequence during it's transformer blocks, does not have a padding mask (though it does have a padding token).
cc @abheesht17 to double check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe you're correct. I looked at the official implementation (here: function at line 57). I'll fix that.
Edit: Found a conversation on the same topic. I'll make the required changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, missed this. That is correct. I've confirmed this with the authors of the paper and they stated that they simply mix the entire sequence using the Fourier Transform. The model might be sensitive to examples with very different amounts of padding, but no major issues/problems were observed in practice.
Done! And left some initial comments. |
I've tested this on the gist you shared in another PR. Here's the link for it: https://colab.research.google.com/gist/apupneja/8258f796a76940394d4a510ed61d08d7/deberta-masked-lm.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This looks great! Just a few small comments to clean up, and this needs a merge/rebase with the master branch.
Disclaimer: Pre-trained models are provided on an "as is" basis, without | ||
warranties or conditions of any kind. The underlying model is provided by a | ||
third party and subject to a separate license, available | ||
[here](https://github.com/facebookresearch/fairseq). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not correct for this model! we can use the same disclaimer as BERT
|
||
# Creating sentencepiece tokenizer for FNet LM preprocessor | ||
bytes_io = io.BytesIO() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove the empty newlines past this point in the code block
|
||
tokenizer = FNetTokenizer(proto=proto) | ||
|
||
preprocessor = FNetMaskedLMPreprocessor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format this as one line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By one line, do you mean that I should create the object inside the FNetMaskedLMPreprocessor
constructor parameters?
preprocessor = FNetMaskedLMPreprocessor(
tokenizer=FNetTokenizer(proto=proto)
)
Something like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I just mean what inside the parentheses here.
preprocessor = FNetMaskedLMPreprocessor(tokenizer=tokenizer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Fixed it.
Also thanks so much for testing this our via the gist! Super helpful. |
Thanks! |
Solves #722