Limit the default sequence length to 1024 for all models #1770

mattdangerw · 2024-08-14T01:36:00Z

Pretrained models are supporting larger and larger sequence lengths. In gemma's case this is a particular nasty gotcha, as very few people have the compute resources to actually train on 8000 token long sequences.

I think it might be the more user friendly approach to lower our default sequence length to 1024. This won't prohibit users from setting a longer sequence length, but it will lessen the unpleasant gotcha of suddenly using a ton of VRAM when training.

We should still almost always document setting the sequence length in our code examples, as it's something the user generally should think about when fine-tuning or generating.

Pretrained models are supporting larger and larger sequence lengths. In gemma's case this is a particular nasty gotcha, as very few people have the compute resources to actually train on 8000 token long sequences. I think it might be the more user friendly approach to lower our default sequence length to 1024. This won't prohibit users from setting a longer sequence length, but it will lessen the unpleasant gotcha of suddenly using a ton of VRAM when training. We should still almost always document setting the sequence length in our code examples, as it's something the user generally should think about when fine-tuning or generating.

SamanehSaadat

Thanks, Matt! I agree that having a lower default value is user-friendlier!

mattdangerw · 2024-08-14T19:04:55Z

Let's try it out. Something tells me we aren't done with discussions here, but hopefully this is a positive delta.

mattdangerw requested a review from SamanehSaadat August 14, 2024 01:36

github-actions bot added the Gemma Gemma model specific issues label Aug 14, 2024

mattdangerw requested a review from fchollet August 14, 2024 01:37

mattdangerw mentioned this pull request Aug 14, 2024

1TB of memory required for model training. #1711

Closed

SamanehSaadat approved these changes Aug 14, 2024

View reviewed changes

mattdangerw merged commit f80fbfd into keras-team:master Aug 14, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit the default sequence length to 1024 for all models #1770

Limit the default sequence length to 1024 for all models #1770

mattdangerw commented Aug 14, 2024

SamanehSaadat left a comment

mattdangerw commented Aug 14, 2024

Limit the default sequence length to 1024 for all models #1770

Limit the default sequence length to 1024 for all models #1770

Conversation

mattdangerw commented Aug 14, 2024

SamanehSaadat left a comment

Choose a reason for hiding this comment

mattdangerw commented Aug 14, 2024