Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LED #9278

Merged
merged 35 commits into from
Jan 5, 2021
Merged

LED #9278

merged 35 commits into from
Jan 5, 2021

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Dec 23, 2020

What does this PR do?

Adds LongformerEncoderDecoder (LED) from @ibeltagy - see: https://github.com/allenai/longformer#longformer

Todo:

  • Important: position embeddings have to be cut to correctly convert original Bart-ilke checkpoints to LED. The reason is that Bart uses a position embedding hack because of which the embedding idx 0 and 1 are never used resulting in an embedding matrix that has a length of 1026 instead of 1024, see:
    # Bart is set up so that if padding_idx is specified then offset the embedding ids by 2
    . All LED checkpoints are hence cut to remove this hack in LED:
model = LEDForConditionalGeneration.from_pretrained("./led-base-16384")
model.model.encoder.embed_positions.weight = torch.nn.Parameter(model.model.encoder.embed_positions.weight[2:, :])
model.model.decoder.embed_positions.weight = torch.nn.Parameter(model.model.decoder.embed_positions.weight[2:, :])
model.save_pretrained("./led-base-16384")

TODO after PR is merged:

  • Correctly add # Copied from .... statements from Bart and Longformer (this probably requires the Bart refactor to be merged before)
  • Open issue regarding problems with TF save_model test
  • Correct templates: delete unnecessary test for tf bart; add gradient checkpointing by default in PT

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.

@patrickvonplaten patrickvonplaten changed the title LED [WIP]LED Dec 23, 2020
@patrickvonplaten patrickvonplaten changed the title [WIP]LED [WIP] LED Dec 23, 2020
return outputs


# Copied from transformers.models.longformer.modeling_longformer.LongformerSelfAttention with Longformer->LEDEncoder
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibeltagy this line ensures that the whole class has to be exactly the same as the corresponding class in LongformerSelfAttention or else the tests would throw an error. This is our safety check to make sure that the original class (which in other libs would just be imported and re-used here) cannot change without this class being changed accordingly as well.

@patrickvonplaten patrickvonplaten changed the title [WIP] LED LED Dec 27, 2020
Copy link
Contributor

@jplu jplu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!!! Just left few tiny comments.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job implementing this! Happy to hear the model templates made your life easier, too.

Left a few nits, but LGTM!

PS: That 16384 embedding size is incredible!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Thanks for adding this new model!

@jplu
Copy link
Contributor

jplu commented Jan 5, 2021

@patrickvonplaten when you have time, can you fix the conflicts and apply the same updates merged in Longformer to LED. Thanks!

@patrickvonplaten patrickvonplaten merged commit 189387e into huggingface:master Jan 5, 2021
@patrickvonplaten patrickvonplaten deleted the Led branch January 5, 2021 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants