stream.get_final_message() does not return the correct usage of output_tokens #424

sirius422 · 2024-03-28T19:00:47Z

Like the title said, stream.get_final_message() always return the output_tokens with the value of 1.
running the exmaple code examples/messages_stream.py, the output would look like:

Hello there!
accumulated message:  {
  "id": "REDACTED",
  "content": [
    {
      "text": "Hello there!",
      "type": "text"
    }
  ],
  "model": "claude-3-opus-20240229",
  "role": "assistant",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "type": "message",
  "usage": {
    "input_tokens": 11,
    "output_tokens": 1
  }
}

However the actual output_tokens should be 6 according to the raw HTTP stream response

event: message_start
data: {"type":"message_start","message":{"id":"REDACTED","type":"message","role":"assistant","content":[],"model":"claude-3-opus-20240229","stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":11,"output_tokens":1}}   }

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}     }

event: ping
data: {"type": "ping"}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}              }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" there"}  }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"!"}            }

event: content_block_stop
data: {"type":"content_block_stop","index":0         }

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":6}               }

event: message_stop
data: {"type":"message_stop"           }

So, is this a bug or a feature? I've seen someone in issue #417 using stream.get_final_message() to obtain the usage information. If output_tokens always returns 1, this won't work properly, I guess.

The text was updated successfully, but these errors were encountered:

sirius422 · 2024-03-28T19:08:33Z

Adding a line under accumulate_event in src/anthropic/lib/streaming/_messages.py file seems to fix the issue. Should I submit a pull request?

Hello there!
accumulated message:  {
  "id": "REDACTED",
  "content": [
    {
      "text": "Hello there!",
      "type": "text"
    }
  ],
  "model": "claude-3-opus-20240229",
  "role": "assistant",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "type": "message",
  "usage": {
    "input_tokens": 11,
    "output_tokens": 6
  }
}

rattrayalex · 2024-03-29T03:37:48Z

Thanks for the report & the PR!

WesleyYue · 2024-04-19T17:37:00Z

Is there a timeline on this being fixed? What's the blocker on merging the PR?

rattrayalex · 2024-04-22T21:44:11Z

This was fixed 3 weeks ago: anthropics/anthropic-sdk-typescript#361

krschacht · 2024-07-14T22:41:22Z

I’m wondering if this fix is mistaken. Shouldn’t the output_tokens be the sum of what appears in the message_start (output_token = 1) and the final message_delta (output_token = 6)?

(there was no discussion on the PR so I’m adding this comment to where it seemed like the real discussion was happening)

rattrayalex · 2024-07-14T23:44:38Z

@krschacht what behavior are you seeing?

krschacht · 2024-07-15T07:52:04Z

@rattrayalex Right now the output_tokens is being set to the final output_token contained within the final message_delta. Is that the total output_token for the full stream? I’ve been assuming we need to sum up the first output_token and the final one to get the total, but the docs don’t actually specify the meaning of those fields.

RobertCraigie · 2024-07-15T09:05:22Z

@krschacht I believe the current behaviour is correct, if you make a non-streaming request with the same inputs, the usage will be the exact same. e.g. an updated version of the messages_stream.py example:

import asyncio
from anthropic import AsyncAnthropic

client = AsyncAnthropic()

async def main() -> None:
    async with client.messages.stream(
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": "Say hello there!",
            }
        ],
        model="claude-3-opus-20240229",
    ) as stream:
        await stream.until_done()

    accumulated = await stream.get_final_message()
    print("accumulated message: ", accumulated.to_json())

    api_message = await client.messages.create(
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": "Say hello there!",
            }
        ],
        model="claude-3-opus-20240229",
    )
    print('api message', api_message.to_json())

asyncio.run(main())

Running this script gives this output for me:

accumulated message:  {
  "id": "msg_01JngEuyKqL3QmvpnivBwAv3",
  "content": [
    {
      "text": "Hello there!",
      "type": "text"
    }
  ],
  "model": "claude-3-opus-20240229",
  "role": "assistant",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "type": "message",
  "usage": {
    "input_tokens": 11,
    "output_tokens": 6
  }
}
api message: {
  "id": "msg_01Ng2rJDRPPCHeN2ULLf4BAA",
  "content": [
    {
      "text": "Hello there!",
      "type": "text"
    }
  ],
  "model": "claude-3-opus-20240229",
  "role": "assistant",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "type": "message",
  "usage": {
    "input_tokens": 11,
    "output_tokens": 6
  }
}

I'll let the Anthropic team know that the docs weren't helpful here! Which docs were you looking at?

krschacht · 2024-07-15T10:31:11Z

Ohh, got it, so that final count is indeed the total and the initial one can be ignored. I wonder why the API includes the initial one then? Anyway, thanks for the quick reply!

sirius422 mentioned this issue Mar 28, 2024

Update the output_tokens after the stream has finished. #425

Closed

WesleyYue mentioned this issue Apr 19, 2024

Output token usage being misreported as 1 when using streaming #454

Closed

rattrayalex closed this as completed Apr 22, 2024

krschacht mentioned this issue Jul 15, 2024

Draft: Add token counting per message AllYourBot/hostedgpt#460

Closed

Stadly mentioned this issue Sep 29, 2024

feat(core): Pass token usage in streamed chunks to callback handlers langchain-ai/langchainjs#6885

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stream.get_final_message() does not return the correct usage of output_tokens #424

stream.get_final_message() does not return the correct usage of output_tokens #424

sirius422 commented Mar 28, 2024

sirius422 commented Mar 28, 2024

rattrayalex commented Mar 29, 2024 •

edited

Loading

WesleyYue commented Apr 19, 2024

rattrayalex commented Apr 22, 2024

krschacht commented Jul 14, 2024

rattrayalex commented Jul 14, 2024

krschacht commented Jul 15, 2024

RobertCraigie commented Jul 15, 2024 •

edited

Loading

krschacht commented Jul 15, 2024

stream.get_final_message() does not return the correct usage of output_tokens #424

stream.get_final_message() does not return the correct usage of output_tokens #424

Comments

sirius422 commented Mar 28, 2024

sirius422 commented Mar 28, 2024

rattrayalex commented Mar 29, 2024 • edited Loading

WesleyYue commented Apr 19, 2024

rattrayalex commented Apr 22, 2024

krschacht commented Jul 14, 2024

rattrayalex commented Jul 14, 2024

krschacht commented Jul 15, 2024

RobertCraigie commented Jul 15, 2024 • edited Loading

krschacht commented Jul 15, 2024

rattrayalex commented Mar 29, 2024 •

edited

Loading

RobertCraigie commented Jul 15, 2024 •

edited

Loading