Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow greedy decoding with 0.0 temperature #182

Merged
merged 1 commit into from
Dec 2, 2024

Conversation

joerunde
Copy link
Contributor

Description

This PR adds the optional tag to the temperature field so that we can test presence in the server, and use greedy decoding if a user passes 0.0 temperature. This is to reduce confusion with the OpenAI api, which doesn't have a sampling/greedy flag and just uses zero temperature to enable greedy decoding.

As is, users will see an error about the temperature being too low if they pass in a temperature in the range (0, 0.05), but if they send in exactly 0, we instead default back to 1. This causes users to open bug reports about greedy mode returning random results. This PR also removes the low temperature check, since I think it would be confusing to disallow low temperatures but explicitly allow zero.

How Has This Been Tested?

I ran a server with vllm-tgis-adapter@main installed, and booted up one grpcui instance so that it would pick up the old protobuf definition. I then stopped the server, installed vllm-tgis-adapter@zero-temperature-support, booted the server again, and started a second grpcui instance so it would pick up the new protobuf definition. I verified that both:

  • Unset temperature and temperature==0 requests with the old protobuf defs returned random results
  • Unset temperature requests with the new protobuf defs returned random results
  • Temperatue==0 requests with the new protobuf defs returned greedy results

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 62.53%. Comparing base (2c80c72) to head (1b2f097).

Files with missing lines Patch % Lines
src/vllm_tgis_adapter/grpc/grpc_server.py 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #182      +/-   ##
==========================================
+ Coverage   62.46%   62.53%   +0.07%     
==========================================
  Files          28       28              
  Lines        1721     1719       -2     
  Branches      211      210       -1     
==========================================
  Hits         1075     1075              
+ Misses        544      543       -1     
+ Partials      102      101       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dtrifiro dtrifiro added this pull request to the merge queue Dec 2, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 2, 2024
@dtrifiro dtrifiro added this pull request to the merge queue Dec 2, 2024
Merged via the queue into main with commit 0654b07 Dec 2, 2024
3 checks passed
@dtrifiro dtrifiro deleted the zero-temperature-support branch December 2, 2024 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants