-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BORT #9112
Add BORT #9112
Conversation
input_ids = tf.convert_to_tensor( | ||
[[0, 18077, 4082, 7804, 8606, 6195, 2457, 3321, 11, 10489, 16, 269, 2579, 328, 2]], | ||
dtype=tf.int32, | ||
) # Schloß Nymphenburg in Munich is really nice! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Das stimmt!
@@ -0,0 +1,143 @@ | |||
# coding=utf-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) we could add some examples here as well similar to how it's done for MT5:
Examples:: |
I think for BortModel
we can just show how to get the last_hidden_state
and for all other models we could show how to get the loss for fine-tuning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks great! Think we only have to wait now for the name and then we're good to go :-)
🔥 Looking forward to taking a look at the conversion script from GluonNLP/mxnet! |
@patrickvonplaten I added some examples for both @julien-c The conversion script is also added - you just need to install These versions are defined in the BORT requirements file. The conversion script also performs a version check. |
src/transformers/models/bort/convert_bort_original_gluonnlp_checkpoint_to_pytorch.py
Outdated
Show resolved
Hide resolved
>>> hidden_states = outputs.last_hidden_state | ||
""" | ||
|
||
config_class = BortConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add model_type = 'bort'
for each class here -> see MT5 for comparison:
model_type = "mt5" |
""" | ||
|
||
config_class = BortConfig | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add model type here for all models:
model_type = "mt5" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool! Have one (nit) and I think we should add the model_type
to each aliased class
…low). Currently disabled, because we wait for model uploads... but they are working
…eckpoint_to_pytorch.py Co-authored-by: Patrick von Platen <[email protected]>
…model_type to bort)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
We'll have to think a bit how to advertise this. Let me draft up a "Contribution Proposal" for the fine-tuning algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this model! There are a few things to adapt to have the same API as the current master and I would very much like to be consistent with the paper and use Bort (not BORT) everywhere in the docs.
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations under the License. | ||
|
||
BORT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BORT | |
Bort |
The authors don't use the all caps.
Overview | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
The BORT model was proposed in `Optimal Subarchitecture Extraction for BERT <https://arxiv.org/abs/2010.10499>`__ by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BORT model was proposed in `Optimal Subarchitecture Extraction for BERT <https://arxiv.org/abs/2010.10499>`__ by | |
The Bort model was proposed in `Optimal Subarchitecture Extraction for BERT <https://arxiv.org/abs/2010.10499>`__ by |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. autoclass:: transformers.BortTokenizerFast | ||
:members: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:members: | |
:members: forward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @sgugger made a typo here, I think you can leave it as :members:
given that it's a fast tokenizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry! I meant it for the PyTorch models :-)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. autoclass:: transformers.BortModel | ||
:members: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:members: | |
:members: forward |
Here and for the rest of the PyTorch models.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. autoclass:: transformers.TFBortModel | ||
:members: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:members: | |
:members: call |
Here and for the rest of the TF models.
We extract an optimal subset of architectural parameters for the BERT architecture from Devlin et al. (2018) by | ||
applying recent breakthroughs in algorithms for neural architecture search. This optimal subset, which we refer to as | ||
"Bort", is demonstrably smaller, having an effective (that is, not counting the embedding layer) size of 5.5% the | ||
original BERT-large architecture, and 16% of the net size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model summary doesn't use the first-person pronouns. This should be changed to fir the style of the rest of the document: "Same as BERT but with xxx..."
Also it doesn't seem to be placed in the right section. If it's like BERT, it should be in the autoeconding models part.
@@ -233,6 +239,7 @@ | |||
(MPNetConfig, (MPNetTokenizer, MPNetTokenizerFast)), | |||
(TapasConfig, (TapasTokenizer, None)), | |||
(LEDConfig, (LEDTokenizer, LEDTokenizerFast)), | |||
(BortConfig, (BortTokenizer, BortTokenizerFast)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be removed I believe.
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
from ...file_utils import is_sentencepiece_available, is_tf_available, is_tokenizers_available, is_torch_available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This init should be adapted to the new style (see any model init in current master) to avoid importing TF/PyTorch when not required.
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
""" BORT model configuration """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" BORT model configuration """ | |
""" Bort model configuration """ |
class BortConfig(PretrainedConfig): | ||
r""" | ||
This is the configuration class to store the configuration of a :class:`~transformers.BortModel` or a | ||
:class:`~transformers.TFBortModel`. It is used to instantiate a BORT model according to the specified arguments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everywhere, BORT -> Bort (we should use the same name as the authors, written in the same way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @stefan-it, thanks a lot for your contribution!
If Bort can be loaded seamlessly in the BERT architecture, is there really a need to redefine all models in PyTorch and TensorFlow? We would need to do this for all models on the hub if that was the case. If there was a change in one of the models I would understand, but given that it's an exact copy if BERT I don't think that's necessary at all.
I understand the conversion script, however. I would just replace the modelname "bort" by "bert" in that model script so that the models are loadable directly in the BERT architecture.
I see that Bort requires RoBERTa tokenizers, which isn't a problem either; tokenizers can be decoupled from their models by specifying a tokenizer_class
in the model config, similarly to what BERTweet does.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. autoclass:: transformers.BortTokenizerFast | ||
:members: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @sgugger made a typo here, I think you can leave it as :members:
given that it's a fast tokenizer.
Hey @stefan-it, I've discussed a bit with @LysandreJik and @sgugger offline and I do agree with @LysandreJik after having thought about it again. I think it's better if we actually don't add any new code (besides the conversion script) that should be added to |
Are we planning to implement the architectural optimization (FPTAS) or just the pre-trained models? |
Great question! For now, we'll just add the model weights - see: #9813. A community contribution showing how to do FPTAS in a notebook would be extremely valuable though. |
Closing in favor of #9813 |
Hi,
this PR adds the recently introduced BORT model from @adewynter and Daniel J. Perry from the Alexa team into Transformers.
BORT was introduced in the Optimal Subarchitecture Extraction For BERT.
Details about BORT:
This should fix #8135 🤗
ToDo tasks: