-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long delay when using streaming + tools #529
Comments
Hi @holdenmatt, unfortunately this is a model limitation (same issue noted in #454 (comment)). We're planning on improving this with future models. |
I see, thanks. If I want faster streaming, would you recommend I move away from tools and try to coax a JSON schema via the system prompt instead? |
Hi @holdenmatt -- one clarification to the above: we stream out each key/value pair together, so long values will result in buffering (the delays you're seeing). In the example you provided, Claude is producing a poem (a long string) as a value, which is why you're seeing the delay. However, a large object with many smaller keys/values wouldn't have this issue.
That could work, this delay you're seeing should only be happening in that specific kind of tool use (where Claude is producing long keys/values). |
Ah, that would explain why I run into this but other folks I talk to haven't seen it. The specific use case for me is generating LaTeX code from text prompts for https://texsandbox.com/ The latex output could be long, depending on the prompt. The reason I use function calling instead of text completion is I want to allow the model to "branch" between the good "latex" case and an "error" case if it doesn't know what to do, or eg the input prompt doesn't make sense. I could avoid tools here if that would improve streaming, but I'd need some other way to signal "this is valid code" vs "this is an error message" |
fyi - I fixed this by moving away from tool calling, and streaming now feels fast again. I hacked my own poor man's function calling on text generation, by prompting the model to write latex or error on the first line, followed by code or an error message. This works fine (so you can close this if you like) but it was the biggest issue I ran into switching from gpt-4o to claude-3.5-sonnet. I quite often use functions/tools with long JSON values, so feature request to improve this in the future. Thanks! |
Is there an issue we can track for improvements to streaming + tool use, or do you plan to post updates here? |
Hey team, is there a planned date for fixing this? This is a big limiter for our user experience for code-gen. |
+1, think this basically makes tool use not viable for our use case - not limited to the typescript API, also a problem in python |
If this helps, there's a hacky workaround similar to the solution mentioned above that's currently working for me and someone else by streaming raw text and forcing a JSON format. Then progressively resolve the text into the partial object. It's surprisingly reliable so far. |
Any news on this? this is super limiting and ruins the user experience |
I'm noticing that if I take my exact prompt and paste it into the Claude app but prefix it with I've tested this quite a few times in the Claude app and it works every time, so I'm wondering if this is an API limitation (or specific safety feature) that can be resolved? |
(Sorry if this isn't the right place to report this, I wasn't sure).
I'm trying to switch from gpt-4o to claude-3.5-sonnet in an app I'm building, but high streaming tool latency is preventing me from doing so. Looks like this was discussed in #454 but wondering how I should proceed?
The total latency of Claude vs gpt-4o is pretty similar, and I think fine.
The issue is that Claude waits a long time before any content is streamed (I often see ~5s delays vs ~500ms for gpt-4o). This is a poor user experience in my app, because users get no feedback that any generation is happening. This will prevent me from switching, even though I much prefer Claude's output quality!
Do you have any plans to fix this? Or do you recommend not using tools + streaming with Claude?
Example timing and test code below, if helpful.
Timing comparison
claude-3-5-sonnet
:Stream created at 0ms
First content received: 4645ms
Streaming time: 46ms
Total time: 4691ms
gpt-4o
:Stream created at 343ms
First content received: 368ms
Streaming time: 2100ms
Total time: 2468ms
Test code:
The text was updated successfully, but these errors were encountered: