-
Notifications
You must be signed in to change notification settings - Fork 304
Add support for "extra_body" to OpenAILLMConfigEntry #1590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…sampling and other fancy vllm stuff. OpenAILLMConfigEntry is used for all OpenAI compliant APIs. vLLMs server is just one of those APIs. vllm also supports a bunch of cool stuff, such as min_p sampling, robust max/min tokens support. This is all specified in the "extra_body" - but since this was not supported by OpenAILLMConfigEntry, it was not possible to use these techniques with local models. I've tested this change with several models on vllm, including mistral large, llama4 maverick, and deepseek.
I will shortly add documentation and tests to this PR |
this is what we want. |
@Hellisotherpeople Thank you for the PR! I am waiting for docs and tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. If you are adding extra_body
, would you like to add extra_headers
as well?
What's the best way to do tests for this? To test this, I figured that I need to point to a vllm server, pass in something like min_p sampling, and verify that I get a (valid) response. If we do not want to point to any vllm server or run one, what's the ideal way to test this? |
@Hellisotherpeople For now, just try to set some valid parameters for |
@Hellisotherpeople Please take a look at my earlier comment |
Codecov ReportAll modified and coverable lines are covered by tests ✅
... and 64 files with indirect coverage changes 🚀 New features to boost your workflow:
|
This will enable min_p sampling and other fancy vllm stuff for AG2.
Why are these changes needed?
OpenAILLMConfigEntry is used for all OpenAI compliant APIs. vLLMs server is just one of those APIs. vllm also supports a bunch of cool stuff, such as min_p sampling, robust max/min tokens support. This is all specified in the "extra_body" - but since this was not supported by OpenAILLMConfigEntry, it was not possible to use these techniques with local models.
I've tested this change with several models on vllm, including mistral large, llama4 maverick, and deepseek.