Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use max_length (number max of tokens) as an argument? #172

Closed
piegu opened this issue Nov 10, 2022 · 3 comments
Closed

How to use max_length (number max of tokens) as an argument? #172

piegu opened this issue Nov 10, 2022 · 3 comments

Comments

@piegu
Copy link

piegu commented Nov 10, 2022

Hi,

max_length is defined here with the value 32.

How to modify it?

Thank you.

@blakechi
Copy link
Contributor

blakechi commented Nov 11, 2022

Hello @piegu , thank you for raising up this issue.

So the max_length will be set to the maximum possible length according to the model_body (sentence-transformers) you use. Please see here for details.

Hope this resolve your issue. If not, I think we can add one more argument when initializing SetFitModel with an assert checking whether the given max_length exceeds the acceptable one the model_body can handle.

@piegu
Copy link
Author

piegu commented Nov 11, 2022

Hi @blakechi,

Thank you for your reply.
However, I don't want to set max_length to the maximum possible length. I want to be able to set it to any value via an argument.

Example: texts in my dataset have a maximum of 128 tokens. Then, I want to be able to set this value as max_length.

Note: when using transformers from Hugging Face, I configure the max_length value in the tokenizer through the argument max_length.

@blakechi
Copy link
Contributor

Sure, I think we can achieve it as what I mentioned here:

Hope this resolve your issue. If not, I think we can add one more argument when initializing SetFitModel with an assert checking whether the given max_length exceeds the acceptable one the model_body can handle.

@lewtun How do you think? I can open a PR for this 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants