vLLM server capabilities #260

QwertyJack · 2024-08-29T06:35:19Z

Thank you for your excellent work!

I've noticed that you've implemented a vLLM-based server with some modifications, but I have a few questions:

Why did you omit the served-model parameter? It's quite useful for loading a model from local storage while maintaining the same name.
Do you think including utility endpoints like /metrics and /health would be beneficial?

Thanks in advance.

The text was updated successfully, but these errors were encountered:

jeffreymeetkai · 2024-08-31T16:43:51Z

Thanks for pointing this out. I've enabled --served-model-name and you should be able to load a local model while accepting requests with another name (e.g.: official model names like meetkai/functionary-small-v3.2).

For question 2, we do not find the need to implement yet as it seem to involve long lines of code to implement as seen from here, and there is also not much demand for it. We also use a monkey-patched AsyncLLMEngine here when grammar sampling is enabled. I'm not sure if monkey-patching other parts of vLLM is needed to get /health to work with this too.

QwertyJack · 2024-09-02T07:13:43Z

I see.
Btw, I found an easy way to integrate those two endpoints. PTAL

nskumz · 2024-09-24T11:05:29Z

@QwertyJack , could you please share the process to get the vLLM endpoint /metrics from this model

QwertyJack · 2024-09-26T01:46:01Z

@QwertyJack , could you please share the process to get the vLLM endpoint /metrics from this model

After #263 is merged, vLLM style endpoints GET /metrics and GET /health will be available after the service starts.

Using the latest main branch and following the README should ensure it works smoothly.

jeffreymeetkai mentioned this issue Aug 31, 2024

Fix served-model-name #262

Merged

QwertyJack mentioned this issue Sep 2, 2024

Add vLLM's health and metrics endpoints #263

Merged

jeffreymeetkai closed this as completed in #263 Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM server capabilities #260

vLLM server capabilities #260

QwertyJack commented Aug 29, 2024

jeffreymeetkai commented Aug 31, 2024

QwertyJack commented Sep 2, 2024

nskumz commented Sep 24, 2024

QwertyJack commented Sep 26, 2024

vLLM server capabilities #260

vLLM server capabilities #260

Comments

QwertyJack commented Aug 29, 2024

jeffreymeetkai commented Aug 31, 2024

QwertyJack commented Sep 2, 2024

nskumz commented Sep 24, 2024

QwertyJack commented Sep 26, 2024