-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vLLM server capabilities #260
Comments
Thanks for pointing this out. I've enabled For question 2, we do not find the need to implement yet as it seem to involve long lines of code to implement as seen from here, and there is also not much demand for it. We also use a monkey-patched AsyncLLMEngine here when grammar sampling is enabled. I'm not sure if monkey-patching other parts of vLLM is needed to get |
I see. |
@QwertyJack , could you please share the process to get the vLLM endpoint /metrics from this model |
After #263 is merged, vLLM style endpoints Using the latest main branch and following the README should ensure it works smoothly. |
Thank you for your excellent work!
I've noticed that you've implemented a vLLM-based server with some modifications, but I have a few questions:
served-model
parameter? It's quite useful for loading a model from local storage while maintaining the same name./metrics
and/health
would be beneficial?Thanks in advance.
The text was updated successfully, but these errors were encountered: