Add head_mask and decoder_head_mask to FSMT #9819

stancld · 2021-01-26T21:39:06Z

This PR implements head_mask and decoder_head_mask for FSMT and it is the follow-up to the open issue #9814.

Motivation: This PR is a part of an endeavour to enable the usage of head_mask and decoder_head_mask for all encoder-decoder transformers following the recent work on BART-like models (#9569).

Fixes: #9814

Reviewer: @stas00

stas00

Thank you for syncing fsmt with others, @stancld - including the docs and better markdown!!!

Is there a way to ensure in the future that if things are added to Bart they are added to fsmt too? On modeling level - they are very similar.

stancld · 2021-01-27T08:05:12Z

I know than one can add, for example, a line like this

# Copied from transformers.models.bart.modeling_bart.BartAttention with Bart->FSMT

before Attention module in FSMT. However, this does not copy only additions, but the whole module from BART, which is, in this case, undesirable, I guess, as these modules are a little bit different. But maybe there is another way I am not aware of.

LysandreJik

Great, thanks for taking care of it @stancld!

LysandreJik · 2021-01-27T09:06:21Z

tests/test_modeling_fsmt.py

-    test_head_masking = False
+    test_head_masking = True


I think this can be removed (as from all other places where it is set to True) as this is the default on the superclass.

Imo it makes it more readabe to only have items which are not tested (and which ultimately should be), this way from a quick hover on the test file we can understand what is left to do to have all the common tests coverage. Let me know what you think.

Thanks, that's a good point. I'll create a PR removing all redundant test_head_masking = True from other source code files later today.

(Also, I did rebase as jaxlib was updated in #9831. Now, it should be, hopefully, fine without any errors :))

patrickvonplaten · 2021-01-27T10:17:33Z

src/transformers/models/auto/tokenization_auto.py

@@ -170,6 +170,7 @@
    GPT2TokenizerFast = None
    HerbertTokenizerFast = None
    LayoutLMTokenizerFast = None
+    LEDTokenizerFast = None


This was actually recently corrected in another commit this morning - so there might be merge conflicts, but should be easy to resolve

Remove test_head_masking flag from test_modeling_fsmt.py since test_head_masking is set to be True by default (thus it is redundant to store).

* Rebase necessary due to an update of jaxlib * Remove test_head_masking=True in tests/test_modeling_fsmt.py as it is redundant.

…stancld/transformers into fsmt_encoder_decoder_head_masks

stas00 · 2021-01-27T16:08:16Z

@LysandreJik, @patrickvonplaten - how can we make sure fsmt gets tracked and synced with all the bart-family changes? while the tokenizer is different, the model is ~95% identical.

LysandreJik · 2021-01-28T14:15:52Z

as @stancld said, we can do that with some statements of the following kind:

# Copied from transformers.models.bart.modeling_bart.BartAttention with Bart->FSMT

The difference between the BART and FSMT implementation of the targeted object must only be the "BART" occurrences that change to "FSMT". @sgugger can tell you more about it.

stas00 · 2021-01-29T00:36:52Z

Thank you, @LysandreJik

I think this is really a question to @patrickvonplaten - who I remember was planning to refactor FSMT to match what he did for Bart. So if this is still planned, Patrick perhaps you could add this item to the agenda - keeping FSMT in sync with the Bart-family (modeling only - tokenizer is similar to xlm).

So the currently proposed solution can't be used, since Bart diverged since FSMT forked it.

It might help to treat FSMT as Bart with the main difference of it having a dual vocab and no tied weights - and a few layers that are different - but identical otherwise. (again for the model only).

patrickvonplaten · 2021-02-01T06:30:05Z

I think this is really a question to @patrickvonplaten - who I remember was planning to refactor FSMT to match what he did for Bart. So if this is still planned, Patrick perhaps you could add this item to the agenda - keeping FSMT in sync with the Bart-family (modeling only - tokenizer is similar to xlm).

Yes, the FSTM / ProphetNet refactor is still on my ToDo List (think next week is reasonable). After the refactor I'll try to add as many # Copied from statements to keep the models in sync. Nevertheless, this PR can be merged as it is now!

Great work @stancld

stancld added 2 commits January 26, 2021 21:11

Add {decoder_,}head_mask to fsmt_modeling.py

9d53760

Enable test_headmasking and some changes to docs

a0b6076

stancld mentioned this pull request Jan 26, 2021

Missing head_mask and decoder_head_mask arguments in encoder-decoder models #9814

Closed

stas00 approved these changes Jan 27, 2021

View reviewed changes

stas00 requested review from LysandreJik and patrickvonplaten January 27, 2021 00:15

LysandreJik approved these changes Jan 27, 2021

View reviewed changes

patrickvonplaten reviewed Jan 27, 2021

View reviewed changes

patrickvonplaten approved these changes Jan 27, 2021

View reviewed changes

stancld and others added 4 commits January 27, 2021 13:02

Remove test_head_masking flag from fsmt test file

55b2ccc

Remove test_head_masking flag from test_modeling_fsmt.py since test_head_masking is set to be True by default (thus it is redundant to store).

Merge remote-tracking branch 'upstream/master'

bb5e1cf

Merge master and remove test_head_masking = True

0aa9efc

* Rebase necessary due to an update of jaxlib * Remove test_head_masking=True in tests/test_modeling_fsmt.py as it is redundant.

Merge branch 'fsmt_encoder_decoder_head_masks' of https://github.com/…

6ff5d8c

…stancld/transformers into fsmt_encoder_decoder_head_masks

patrickvonplaten merged commit 0c6c0af into huggingface:master Feb 1, 2021

stancld deleted the fsmt_encoder_decoder_head_masks branch February 2, 2021 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add head_mask and decoder_head_mask to FSMT #9819

Add head_mask and decoder_head_mask to FSMT #9819

stancld commented Jan 26, 2021 •

edited by stas00

Loading

stas00 left a comment

stancld commented Jan 27, 2021

LysandreJik left a comment

LysandreJik Jan 27, 2021

stancld Jan 27, 2021

patrickvonplaten Jan 27, 2021

stas00 commented Jan 27, 2021

LysandreJik commented Jan 28, 2021

stas00 commented Jan 29, 2021 •

edited

Loading

patrickvonplaten commented Feb 1, 2021

Add head_mask and decoder_head_mask to FSMT #9819

Add head_mask and decoder_head_mask to FSMT #9819

Conversation

stancld commented Jan 26, 2021 • edited by stas00 Loading

stas00 left a comment

Choose a reason for hiding this comment

stancld commented Jan 27, 2021

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Jan 27, 2021

Choose a reason for hiding this comment

stancld Jan 27, 2021

Choose a reason for hiding this comment

patrickvonplaten Jan 27, 2021

Choose a reason for hiding this comment

stas00 commented Jan 27, 2021

LysandreJik commented Jan 28, 2021

stas00 commented Jan 29, 2021 • edited Loading

patrickvonplaten commented Feb 1, 2021

stancld commented Jan 26, 2021 •

edited by stas00

Loading

stas00 commented Jan 29, 2021 •

edited

Loading