Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support per-request seed #1211

Closed
WoosukKwon opened this issue Sep 28, 2023 · 3 comments · Fixed by #2514
Closed

Support per-request seed #1211

WoosukKwon opened this issue Sep 28, 2023 · 3 comments · Fixed by #2514
Labels
enhancement New feature or request

Comments

@WoosukKwon
Copy link
Collaborator

WoosukKwon commented Sep 28, 2023

Although part of that problem is that there's no per-request seed, something we also really need.

Originally posted by @TheBloke in #866 (comment)

@TheBloke
Copy link

TheBloke commented Sep 28, 2023

Yeah this would be very helpful.

I had a client who moved away from vLLM to TGI because of this. vLLM was giving 20% better throughput / requests per second, but there was a significant repetition problem with vLLM. If the user sent the same message again, the LLM would respond in exactly the same way. Here's a (made up) example of the sort of issue we'd see:

User: hello there
LLM: Hi there how can I help?
User: Who are you?
LLM: I am a helpful LLM here to be helpful for you
User: what can you do?
LLM: I can do many things, I can answer questions, I can write poetry, I can write stories. Whatever you want!
User: what can you do?
LLM: I can do many things, I can answer questions, I can write poetry, I can write stories. Whatever you want!

We found this was not affected at all by increasing frequency_penalty and presence_penalty, even as high as 2.0.

I haven't re-tested this since there were improvements to the repetition penalty controls, so maybe it's a bit better now.

But the fact that the seed is always the same for every request IMHO greatly increases the chance of this sort of repeated generation.

If the seed can be randomised on every request, then combined with a modest repetition_penalty value, it will be almost impossible to get repeated dialogue. With TGI, we found that with repetition_penalty=1.1, we never got repeats, even when the user says the same message 5 or 10 times in a row.

Thanks in advance for this enhancement!

@WoosukKwon WoosukKwon added the enhancement New feature or request label Sep 29, 2023
@winglian
Copy link
Contributor

Although part of that problem is that there's no per-request seed, something we also really need.

Originally posted by @TheBloke in #866 (comment)

is there somewhere you could point me to in the code where this would need to be implemented?

@raihan0824
Copy link

any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants