Allow greedy decoding with 0.0 temperature #182
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds the
optional
tag to thetemperature
field so that we can test presence in the server, and use greedy decoding if a user passes 0.0 temperature. This is to reduce confusion with the OpenAI api, which doesn't have a sampling/greedy flag and just uses zero temperature to enable greedy decoding.As is, users will see an error about the temperature being too low if they pass in a temperature in the range (0, 0.05), but if they send in exactly 0, we instead default back to 1. This causes users to open bug reports about greedy mode returning random results. This PR also removes the low temperature check, since I think it would be confusing to disallow low temperatures but explicitly allow zero.
How Has This Been Tested?
I ran a server with vllm-tgis-adapter@main installed, and booted up one
grpcui
instance so that it would pick up the old protobuf definition. I then stopped the server, installed vllm-tgis-adapter@zero-temperature-support, booted the server again, and started a secondgrpcui
instance so it would pick up the new protobuf definition. I verified that both:Merge criteria: