-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support per-request seed #1211
Comments
Yeah this would be very helpful. I had a client who moved away from vLLM to TGI because of this. vLLM was giving 20% better throughput / requests per second, but there was a significant repetition problem with vLLM. If the user sent the same message again, the LLM would respond in exactly the same way. Here's a (made up) example of the sort of issue we'd see: User: hello there We found this was not affected at all by increasing I haven't re-tested this since there were improvements to the repetition penalty controls, so maybe it's a bit better now. But the fact that the seed is always the same for every request IMHO greatly increases the chance of this sort of repeated generation. If the seed can be randomised on every request, then combined with a modest Thanks in advance for this enhancement! |
is there somewhere you could point me to in the code where this would need to be implemented? |
any update on this? |
Although part of that problem is that there's no per-request seed, something we also really need.
Originally posted by @TheBloke in #866 (comment)
The text was updated successfully, but these errors were encountered: