Add BORT #9112

stefan-it · 2020-12-15T00:24:40Z

Hi,

this PR adds the recently introduced BORT model from @adewynter and Daniel J. Perry from the Alexa team into Transformers.

BORT was introduced in the Optimal Subarchitecture Extraction For BERT.

Details about BORT:

We extract an optimal subset of architectural parameters for the BERT architecture from Devlin et al. (2018) by applying recent breakthroughs in algorithms for neural architecture search. This optimal subset, which we refer to as "Bort", is demonstrably smaller, having an effective (that is, not counting the embedding layer) size of 5.5% the original BERT-large architecture, and 16% of the net size. Bort is also able to be pretrained in 288 GPU hours, which is 1.2% of the time required to pretrain the highest-performing BERT parametric architectural variant, RoBERTa-large (Liu et al., 2019), and about 33% of that of the world-record, in GPU hours, required to train BERT-large on the same hardware. It is also 7.9x faster on a CPU, as well as being better performing than other compressed variants of the architecture, and some of the non-compressed variants: it obtains performance improvements of between 0.3% and 31%, absolute, with respect to BERT-large, on multiple public natural language understanding (NLU) benchmarks.

This should fix #8135 🤗

ToDo tasks:

Upload models (both PyTorch and TensorFlow model) to model hub
Add conversion script from Gluonnlp to Transformers
Enable unit tests (they are working and just wait for the model upload)

patrickvonplaten · 2020-12-15T09:14:36Z

tests/test_modeling_tf_bort.py

+        input_ids = tf.convert_to_tensor(
+            [[0, 18077, 4082, 7804, 8606, 6195, 2457, 3321, 11, 10489, 16, 269, 2579, 328, 2]],
+            dtype=tf.int32,
+        )  # Schloß Nymphenburg in Munich is really nice!


Das stimmt!

docs/source/model_doc/bort.rst

patrickvonplaten · 2020-12-15T09:18:40Z

src/transformers/models/bort/modeling_bort.py

@@ -0,0 +1,143 @@
+# coding=utf-8


(nit) we could add some examples here as well similar to how it's done for MT5:

transformers/src/transformers/models/mt5/modeling_mt5.py

Line 33 in c19d046

Examples::

I think for BortModel we can just show how to get the last_hidden_state and for all other models we could show how to get the loss for fine-tuning.

patrickvonplaten

PR looks great! Think we only have to wait now for the name and then we're good to go :-)

julien-c · 2020-12-15T09:22:55Z

🔥 Looking forward to taking a look at the conversion script from GluonNLP/mxnet!

stefan-it · 2020-12-15T13:42:40Z

@patrickvonplaten I added some examples for both modeling_bort.py and modeling_tf_bort.py` 🤗

@julien-c The conversion script is also added - you just need to install gluonnlp==0.8.3 and mxnet==1.5.0.

These versions are defined in the BORT requirements file. The conversion script also performs a version check.

src/transformers/models/bort/convert_bort_original_gluonnlp_checkpoint_to_pytorch.py

patrickvonplaten · 2020-12-16T08:31:10Z

src/transformers/models/bort/modeling_bort.py

+        >>> hidden_states = outputs.last_hidden_state
+    """
+
+    config_class = BortConfig


we should add model_type = 'bort' for each class here -> see MT5 for comparison:

transformers/src/transformers/models/mt5/modeling_mt5.py

Line 43 in 51adb97

model_type = "mt5"

patrickvonplaten · 2020-12-16T08:31:53Z

src/transformers/models/bort/modeling_tf_bort.py

+    """
+
+    config_class = BortConfig
+


also add model type here for all models:

transformers/src/transformers/models/mt5/modeling_tf_mt5.py

Line 65 in 51adb97

model_type = "mt5"

patrickvonplaten

Really cool! Have one (nit) and I think we should add the model_type to each aliased class

…low). Currently disabled, because we wait for model uploads... but they are working

…sformers

…and TensorFlow)

…eckpoint_to_pytorch.py Co-authored-by: Patrick von Platen <[email protected]>

…model_type to bort)

… TensorFlow)

patrickvonplaten

Looks great!

patrickvonplaten · 2021-01-13T17:20:12Z

We'll have to think a bit how to advertise this. Let me draft up a "Contribution Proposal" for the fine-tuning algorithm.

sgugger

Thanks for adding this model! There are a few things to adapt to have the same API as the current master and I would very much like to be consistent with the paper and use Bort (not BORT) everywhere in the docs.

sgugger · 2021-01-13T17:50:55Z

docs/source/model_doc/bort.rst

+    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+    specific language governing permissions and limitations under the License.
+
+BORT


Suggested change

BORT

Bort

The authors don't use the all caps.

sgugger · 2021-01-13T17:51:08Z

docs/source/model_doc/bort.rst

+Overview
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The BORT model was proposed in `Optimal Subarchitecture Extraction for BERT <https://arxiv.org/abs/2010.10499>`__ by


Suggested change

The BORT model was proposed in `Optimal Subarchitecture Extraction for BERT <https://arxiv.org/abs/2010.10499>`__ by

The Bort model was proposed in `Optimal Subarchitecture Extraction for BERT <https://arxiv.org/abs/2010.10499>`__ by

sgugger · 2021-01-13T17:52:03Z

docs/source/model_doc/bort.rst

+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.BortTokenizerFast
+    :members:


Suggested change

:members:

:members: forward

I think @sgugger made a typo here, I think you can leave it as :members: given that it's a fast tokenizer.

Sorry! I meant it for the PyTorch models :-)

sgugger · 2021-01-13T17:52:13Z

docs/source/model_doc/bort.rst

+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.BortModel
+    :members:


Suggested change

:members:

:members: forward

Here and for the rest of the PyTorch models.

sgugger · 2021-01-13T17:53:41Z

docs/source/model_doc/bort.rst

+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.TFBortModel
+    :members:


Suggested change

:members:

:members: call

Here and for the rest of the TF models.

sgugger · 2021-01-13T17:55:15Z

docs/source/model_summary.rst

+We extract an optimal subset of architectural parameters for the BERT architecture from Devlin et al. (2018) by
+applying recent breakthroughs in algorithms for neural architecture search. This optimal subset, which we refer to as
+"Bort", is demonstrably smaller, having an effective (that is, not counting the embedding layer) size of 5.5% the
+original BERT-large architecture, and 16% of the net size.


The model summary doesn't use the first-person pronouns. This should be changed to fir the style of the rest of the document: "Same as BERT but with xxx..."

Also it doesn't seem to be placed in the right section. If it's like BERT, it should be in the autoeconding models part.

sgugger · 2021-01-13T17:57:09Z

src/transformers/models/auto/tokenization_auto.py

@@ -233,6 +239,7 @@
        (MPNetConfig, (MPNetTokenizer, MPNetTokenizerFast)),
        (TapasConfig, (TapasTokenizer, None)),
        (LEDConfig, (LEDTokenizer, LEDTokenizerFast)),
+        (BortConfig, (BortTokenizer, BortTokenizerFast)),


This line should be removed I believe.

sgugger · 2021-01-13T17:57:48Z

src/transformers/models/bort/__init__.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ...file_utils import is_sentencepiece_available, is_tf_available, is_tokenizers_available, is_torch_available


This init should be adapted to the new style (see any model init in current master) to avoid importing TF/PyTorch when not required.

sgugger · 2021-01-13T17:58:09Z

src/transformers/models/bort/configuration_bort.py

+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" BORT model configuration """


Suggested change

""" BORT model configuration """

""" Bort model configuration """

sgugger · 2021-01-13T17:58:34Z

src/transformers/models/bort/configuration_bort.py

+class BortConfig(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a :class:`~transformers.BortModel` or a
+    :class:`~transformers.TFBortModel`. It is used to instantiate a BORT model according to the specified arguments,


Everywhere, BORT -> Bort (we should use the same name as the authors, written in the same way).

LysandreJik

Hi @stefan-it, thanks a lot for your contribution!

If Bort can be loaded seamlessly in the BERT architecture, is there really a need to redefine all models in PyTorch and TensorFlow? We would need to do this for all models on the hub if that was the case. If there was a change in one of the models I would understand, but given that it's an exact copy if BERT I don't think that's necessary at all.

I understand the conversion script, however. I would just replace the modelname "bort" by "bert" in that model script so that the models are loadable directly in the BERT architecture.

I see that Bort requires RoBERTa tokenizers, which isn't a problem either; tokenizers can be decoupled from their models by specifying a tokenizer_class in the model config, similarly to what BERTweet does.

LysandreJik · 2021-01-14T10:07:12Z

docs/source/model_doc/bort.rst

+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.BortTokenizerFast
+    :members:


I think @sgugger made a typo here, I think you can leave it as :members: given that it's a fast tokenizer.

patrickvonplaten · 2021-01-14T16:13:41Z

Hey @stefan-it,

I've discussed a bit with @LysandreJik and @sgugger offline and I do agree with @LysandreJik after having thought about it again. I think it's better if we actually don't add any new code (besides the conversion script) that should be added to src/transformers/models/bert/ and the docs page. I'm very sorry to have you asked to go down this road! I think however it does make more sense to not add any "tokenizer" or "model" code as those are exact copies of the RobertaTokenizer and BertModel. It's probably most efficient to open a new PR and only add the required files. Super sorry again!

gaceladri · 2021-01-18T12:03:13Z

Are we planning to implement the architectural optimization (FPTAS) or just the pre-trained models?

patrickvonplaten · 2021-01-27T08:03:55Z

Are we planning to implement the architectural optimization (FPTAS) or just the pre-trained models?

Great question! For now, we'll just add the model weights - see: #9813. A community contribution showing how to do FPTAS in a notebook would be extremely valuable though.

patrickvonplaten · 2021-01-27T08:04:10Z

Closing in favor of #9813

stefan-it added the New model label Dec 15, 2020

stefan-it marked this pull request as draft December 15, 2020 00:29

patrickvonplaten self-requested a review December 15, 2020 09:12

patrickvonplaten reviewed Dec 15, 2020

View reviewed changes

docs/source/model_doc/bort.rst Show resolved Hide resolved

patrickvonplaten reviewed Dec 15, 2020

View reviewed changes

patrickvonplaten reviewed Dec 16, 2020

View reviewed changes

src/transformers/models/bort/convert_bort_original_gluonnlp_checkpoint_to_pytorch.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Dec 16, 2020

View reviewed changes

patrickvonplaten mentioned this pull request Dec 16, 2020

Huggingface support alexa/bort#4

Closed

patrickvonplaten marked this pull request as ready for review December 29, 2020 23:17

patrickvonplaten changed the title ~~Add support for BORT~~ Add BORT Dec 29, 2020

stefan-it added 14 commits January 12, 2021 18:53

readme: mention new BORT model

91b0ba1

docs: fix merge conflicts

033d2a8

docs: add new entry for BORT model

03147f2

docs: add new documentation for BORT

5aa2b58

models: add new BORT model

db228a1

auto: add new entries for BERT model

562a907

init: include BORT imports

9bbf319

utils: include dummy objects for new BORT integration

68ade8e

utils: add modeling tests for BORT to files with no common tests list

0622cf4

tests: add integration tests for BORT model (both PyTorch and TensorF…

db69ae5

…low). Currently disabled, because we wait for model uploads... but they are working

docs: fix link to BORT model

a2877d7

models: make sure BORT* comes before BERT ;)

9eb7995

docs: add BORT as new toc entry

91f1e9f

bort: add initial and working conversion script from Gluonnlp to Tran…

77c3466

…sformers

stefan-it and others added 9 commits January 12, 2021 19:30

bort: add more examples for various BORT model classes (both Pytorch …

513c568

…and TensorFlow)

tokenization: fix order of BORT tokenizer in auto mode

8d22d62

configuration: fix order of BORT config in auto mode

e2e14f0

bort: use bort as model type in model configuration

d229ae9

Update src/transformers/models/bort/convert_bort_original_gluonnlp_ch…

d069d08

…eckpoint_to_pytorch.py Co-authored-by: Patrick von Platen <[email protected]>

models: adjust coding style for BORT conversion script

9dc9973

bort: address @patrickvonplaten 's review comments 🤗 (explit. define …

3c642bc

…model_type to bort)

models: remove duplicated Bort* instances

35d0e6c

tests: re-activate integration tests for bort model (both PyTorch and…

399b798

… TensorFlow)

patrickvonplaten approved these changes Jan 13, 2021

View reviewed changes

patrickvonplaten requested review from sgugger and LysandreJik January 13, 2021 17:19

sgugger approved these changes Jan 13, 2021

View reviewed changes

LysandreJik reviewed Jan 14, 2021

View reviewed changes

stefan-it mentioned this pull request Jan 26, 2021

ADD BORT #9813

Merged

patrickvonplaten closed this Jan 27, 2021

LysandreJik mentioned this pull request Sep 17, 2021

Bort (Amazon's reduced BERT) #8135

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BORT #9112

Add BORT #9112

stefan-it commented Dec 15, 2020 •

edited

Loading

patrickvonplaten Dec 15, 2020

patrickvonplaten Dec 15, 2020

patrickvonplaten left a comment

julien-c commented Dec 15, 2020

stefan-it commented Dec 15, 2020

patrickvonplaten Dec 16, 2020

patrickvonplaten Dec 16, 2020

patrickvonplaten left a comment •

edited

Loading

patrickvonplaten left a comment

patrickvonplaten commented Jan 13, 2021

sgugger left a comment

sgugger Jan 13, 2021

sgugger Jan 13, 2021

sgugger Jan 13, 2021

LysandreJik Jan 14, 2021

sgugger Jan 14, 2021

sgugger Jan 13, 2021

sgugger Jan 13, 2021

sgugger Jan 13, 2021

sgugger Jan 13, 2021

sgugger Jan 13, 2021

sgugger Jan 13, 2021

sgugger Jan 13, 2021

LysandreJik left a comment

LysandreJik Jan 14, 2021

patrickvonplaten commented Jan 14, 2021

gaceladri commented Jan 18, 2021

patrickvonplaten commented Jan 27, 2021

patrickvonplaten commented Jan 27, 2021

	The BORT model was proposed in `Optimal Subarchitecture Extraction for BERT <https://arxiv.org/abs/2010.10499>`__ by
	The Bort model was proposed in `Optimal Subarchitecture Extraction for BERT <https://arxiv.org/abs/2010.10499>`__ by

	""" BORT model configuration """
	""" Bort model configuration """

Add BORT #9112

Add BORT #9112

Conversation

stefan-it commented Dec 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

julien-c commented Dec 15, 2020

stefan-it commented Dec 15, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten left a comment • edited Loading

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

patrickvonplaten commented Jan 13, 2021

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten commented Jan 14, 2021

gaceladri commented Jan 18, 2021

patrickvonplaten commented Jan 27, 2021

patrickvonplaten commented Jan 27, 2021

stefan-it commented Dec 15, 2020 •

edited

Loading

patrickvonplaten left a comment •

edited

Loading