Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the default sequence length to 1024 for all models #1770

Merged

Conversation

mattdangerw
Copy link
Member

Pretrained models are supporting larger and larger sequence lengths. In gemma's case this is a particular nasty gotcha, as very few people have the compute resources to actually train on 8000 token long sequences.

I think it might be the more user friendly approach to lower our default sequence length to 1024. This won't prohibit users from setting a longer sequence length, but it will lessen the unpleasant gotcha of suddenly using a ton of VRAM when training.

We should still almost always document setting the sequence length in our code examples, as it's something the user generally should think about when fine-tuning or generating.

Pretrained models are supporting larger and larger sequence lengths. In
gemma's case this is a particular nasty gotcha, as very few people have
the compute resources to actually train on 8000 token long sequences.

I think it might be the more user friendly approach to lower our default
sequence length to 1024. This won't prohibit users from setting a longer
sequence length, but it will lessen the unpleasant gotcha of suddenly
using a ton of VRAM when training.

We should still almost always document setting the sequence length in
our code examples, as it's something the user generally should think
about when fine-tuning or generating.
@github-actions github-actions bot added the Gemma Gemma model specific issues label Aug 14, 2024
@mattdangerw mattdangerw requested a review from fchollet August 14, 2024 01:37
Copy link
Member

@SamanehSaadat SamanehSaadat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Matt! I agree that having a lower default value is user-friendlier!

@mattdangerw
Copy link
Member Author

Let's try it out. Something tells me we aren't done with discussions here, but hopefully this is a positive delta.

@mattdangerw mattdangerw merged commit f80fbfd into keras-team:master Aug 14, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Gemma Gemma model specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants