Clean up model input names for consistency #327

mattdangerw · 2022-08-29T23:07:24Z

This proposes a few changes to the naming our our model and layer inputs
and outputs.

Rename input_ids -> token_ids for bert/roberta.
Everything is an "input", including the segment id input, so I don't
think input is a helpful naming prefix in this case.
Rename input_mask -> padding_mask for bert/roberta.
This matches the name of the variable for the transformer
encoder/decoder argument.
Rename tokens -> token_ids for MLMMaskGenerator.
This layer only operates in id space, so I think token_ids is more
descriptive and consistent with above.

This proposes a few changes to the naming our our model and layer inputs and outputs. 1) Rename `input_ids` -> `token_ids` for bert/roberta. Everything is an "input", including the segment id input, so I don't think input is a helpful naming prefix in this case. 2) Rename `input_mask` -> `padding_mask` for bert/roberta. This matches the name of the variable for the transformer encoder/decoder argument. 3) Rename `tokens` -> `token_ids` for MLMMaskGenerator. This layer only operates in id space, so I think token_ids is more descriptive and consistent with above.

jbischof · 2022-08-29T23:44:08Z

Looks good in general but I'm confused by the padding_mask label. Aren't we also masking the holdout words in pretraining and not just the padding?

Does the examples/bert/ code still work? I see it still uses input_ids here.

Finally, there's one other naming inconsistency you could mop up in sine_position_encoding.

jbischof

LGTM! Added some comments for consideration but trust your judgment.

mattdangerw · 2022-08-30T00:03:28Z

@jbischof Thanks! And good call I will rename in examples/bert to match.

Re padding_mask and the MLM masked words, that's actually a different mask I think. There's actually a [PAD] token and a [MASK] token in the bert vocab. During pretraining, [PAD] will be used only to pad sequences, and [MASK] will be used for the MLM labels.

So I do think it is accurate to use padding_mask as a name here. Other options are mask or token_mask. I am actually fine with any of those. I was just choosing padding_mask because that the name we chose for our TransformerEncoder/TransformerDecoder https://keras.io/api/keras_nlp/layers/transformer_encoder/#call-method.

fchollet

LGTM

mattdangerw changed the title ~~Clean up model input/outputs for consistency~~ Clean up model input for consistency Aug 29, 2022

mattdangerw changed the title ~~Clean up model input for consistency~~ Clean up model input names for consistency Aug 29, 2022

mattdangerw requested review from jbischof and fchollet August 29, 2022 23:14

jbischof approved these changes Aug 29, 2022

View reviewed changes

fchollet approved these changes Aug 30, 2022

View reviewed changes

mattdangerw merged commit 8d539b6 into keras-team:master Aug 30, 2022

abheesht17 mentioned this pull request Sep 1, 2022

bert_base_zh, bert_base_multi_cased: Add BERT Base Variants #319

Merged

jbischof mentioned this pull request Dec 10, 2022

Update "tokens" -> "token_ids" keras-team/keras-io#1152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up model input names for consistency #327

Clean up model input names for consistency #327

mattdangerw commented Aug 29, 2022

jbischof commented Aug 29, 2022 •

edited

Loading

jbischof left a comment

mattdangerw commented Aug 30, 2022 •

edited

Loading

fchollet left a comment

Clean up model input names for consistency #327

Clean up model input names for consistency #327

Conversation

mattdangerw commented Aug 29, 2022

jbischof commented Aug 29, 2022 • edited Loading

jbischof left a comment

Choose a reason for hiding this comment

mattdangerw commented Aug 30, 2022 • edited Loading

fchollet left a comment

Choose a reason for hiding this comment

jbischof commented Aug 29, 2022 •

edited

Loading

mattdangerw commented Aug 30, 2022 •

edited

Loading