[Feature]: Support stream_options with vLLM #5197

pennycoders · 2024-08-14T12:18:47Z

The Feature

Requests made with stream: true to LiteLLM should support passing the usage through if provided by the backend, in this case, vLLM

Motivation, pitch

Hi,
Provided that vLLM provides support for usage information during streaming requests (please see this PR), it would be suitable for LiteLLM to support that as well. At the time of opening this issue, it does not seem that it is supported, or if it is, it is not documented. Please keep in mind I am willing to make this contribution myself.

Thanks,
Alex

Twitter / LinkedIn details

No response

krrishdholakia · 2024-08-14T15:56:29Z

Hi @pennycoders this is already supported behaviour. we check if the response contains stream_options or the provider-specific equivalent and return that. If you don't see this on latest, try bumping and let me know what you see.

Where in docs would this have been useful to see?

pennycoders · 2024-08-14T20:07:53Z

Hi @krrishdholakia,

Thank you very much for the quick reply.

Well, I am using vLLM as a backend and proxying both streaming and non-streaming requests to it via vLLM. However, even though when I call vLLM's /v1/chat/completions endpoint directly with stream_options, the last chunk before [DONE] looks like this:

{
    "id": "chat-a071f1a541c648d9ac615559fb7c3fab",
    "object": "chat.completion.chunk",
    "created": 1723664201,
    "model": "meta-llama/Meta-Llama-3.1-70B-Instruct",
    "choices": [],
    "usage": {
        "prompt_tokens": 4539,
        "total_tokens": 5392,
        "completion_tokens": 853
    }
}

However, when I call this exact instance via LiteLLM, I get the following on the chunk before [DONE]:

{
    "id": "chat-ac6b39f408564693b4f20cbe62513b2b",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "delta": {}
        }
    ],
    "created": 1723664471,
    "model": "meta-llama/Meta-Llama-3.1-70B-Instruct",
    "object": "chat.completion.chunk"
}

Regarding the documentation, I don't see vLLM mentioned on this page:
https://docs.litellm.ai/docs/completion/input

In the code, I see an if-else statement here: https://github.com/BerriAI/litellm/blob/main/litellm/llms/vllm.py#L86

Can you please provide me with what I am doing wrong please? Please find the request I send in both cases below:

{
    "model": "meta-llama/Meta-Llama-3.1-70B-Instruct",
    "messages": [
        {
            "role": "system",
            "content": "**System Role:**\nYou are a poet and you write poems. You will write a poem about whatever subject is given to you"
        },
        {
            "role": "user",
            "content": "Flowers"
        }
    ],
    "temperature": 1.00,
    "top_p": 0.9,
    "n": 1,
    "stream": true,
    "stream_options": {
        "include_usage": true,
        "continuous_usage_stats": true
    },
    "seed": 1,
    "presence_penalty": 0,
    "frequency_penalty": 0,
    "logit_bias": {}
}

krrishdholakia · 2024-08-14T20:21:33Z

Hi @pennycoders which version is this on? if not latest - can you try bumping

Here's the relevant code block for handling streaming_usage -

litellm/litellm/utils.py

Line 10552 in 22243c6

setattr(response, "usage", complete_streaming_response.usage)

pennycoders · 2024-08-14T20:27:49Z

Hey @krrishdholakia,

I was using main-v1.40.4. Just tested with latest and it works.

Thank you very much!

Alex

ishaan-jaff · 2024-09-18T00:03:41Z

Hi @pennycoders, curious do you use LiteLLM Proxy in production today If so, I'd love to hop on a call and learn how we can improve LiteLLM for you

my cal for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
my linkedin if you prefer DMs: https://www.linkedin.com/in/reffajnaahsi/

pennycoders added the enhancement New feature or request label Aug 14, 2024

krrishdholakia closed this as completed Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Support stream_options with vLLM #5197

[Feature]: Support stream_options with vLLM #5197

pennycoders commented Aug 14, 2024

krrishdholakia commented Aug 14, 2024

pennycoders commented Aug 14, 2024

krrishdholakia commented Aug 14, 2024

pennycoders commented Aug 14, 2024

ishaan-jaff commented Sep 18, 2024

[Feature]: Support stream_options with vLLM #5197

[Feature]: Support stream_options with vLLM #5197

Comments

pennycoders commented Aug 14, 2024

The Feature

Motivation, pitch

Twitter / LinkedIn details

krrishdholakia commented Aug 14, 2024

pennycoders commented Aug 14, 2024

krrishdholakia commented Aug 14, 2024

pennycoders commented Aug 14, 2024

ishaan-jaff commented Sep 18, 2024