Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session breaks upon parsing context #103

Closed
binarynoise opened this issue Jul 20, 2024 · 12 comments · Fixed by #125
Closed

Session breaks upon parsing context #103

binarynoise opened this issue Jul 20, 2024 · 12 comments · Fixed by #125
Labels
bug Something isn't working released

Comments

@binarynoise
Copy link
Contributor

binarynoise commented Jul 20, 2024

Sorry, something went wrong.

SyntaxError: JSON.parse: end of data when ',' or ']' was expected at line 1 column x of the JSON data

This is independent from the knowledge as system prompt experiment.

Here is the network traffic for that session as har file: localhost_Archive [24-07-20 17-45-41].har.json. You can load it into Firefox or Chrome to see the details.

As far as I can tell, ollama does send a valid response (in contrast to #97) but hollama for some reason fails to parse the last line with the context. The x varies wildly around 30k.
It happens around the 5th or 6th response for me with different models and prompts.

As a consequence, the conversation's context is lost and a new conversation begins on retrying the last message. Sometimes not everything gets lost but the conversation is broken anyway.

As a workaround, would it be possible to hold onto the last valid context and use that when retrying the message?

Switching to another conversation and coming back to retry does actually trigger #97.

@binarynoise
Copy link
Contributor Author

Or just save the returned context for each message to be able to start over at a given point when the model starts generating nonsense

@fmaclen fmaclen added the bug Something isn't working label Jul 20, 2024
@fmaclen
Copy link
Owner

fmaclen commented Jul 20, 2024

Thanks, I was able to replicate the issue.

I'm not entirely sure yet but I suspect is because we are trying to format the completion in "small chunks" as Ollama streams them and one of the chunks is causing the parser to break.

As a workaround, would it be possible to hold onto the last valid context and use that when retrying the message?

Yeah, I wanted to implement something like this so messages can be retried or even edited. #9

@fmaclen
Copy link
Owner

fmaclen commented Jul 20, 2024

Here's what I think it's happening, when we catch an error during a completion (any kind of error) the prompt is reset so the user can try again, but when it re-submits the prompt the context array with all of the tokens is lost.

@fmaclen
Copy link
Owner

fmaclen commented Jul 20, 2024

I don't think we can predict when the JSON parsing will fail, but whenever there is any kind of error the UI should hint a clear path on how to fix it (if possible).

In a large number of cases I expect simply "retrying" should fix most issues: #9 (comment)

@binarynoise
Copy link
Contributor Author

#9 doesn't help at all for this problem. Regenerating the failing message or even one earlier will cause the same error again.

@fmaclen
Copy link
Owner

fmaclen commented Jul 21, 2024

@binarynoise true, #103 won't be fixed by #106.

What I meant is that in order to close this issue we want to make sure you can indeed click "Retry" to continue the session as if the error never happened.

@fmaclen
Copy link
Owner

fmaclen commented Jul 21, 2024

I can now reliably get this error:

SyntaxError: JSON.parse: end of data when ',' or ']' was expected at line 1 column x of the JSON data

I think this is caused by exceeding the limit of the context window on the model, which causes the completion to get truncated (which breaks the JSON parser). This is what I see in the Ollama logs when I retry the failed message.

[GIN] 2024/07/21 - 17:47:48 | 200 |  1.964507833s |             ::1 | POST     "/api/generate"
INFO [update_slots] input truncated | n_ctx=2048 n_erase=2767 n_keep=4 n_left=2044 n_shift=1022 tid="0x1f6960c00" timestamp=1721598478

fmaclen added a commit that referenced this issue Jul 21, 2024
- All errors can now be retried
- Adds a hint to explain the behavior in #103
@fmaclen
Copy link
Owner

fmaclen commented Jul 21, 2024

I just added a hint in the UI about the potential cause of the error:

localhost_5174_sessions_6fpqqn

A more final solution to this issue could be to switch to the /api/chat endpoint (instead of /api/generate) which probably fails more gracefully, but if this error is really only caused by the exceeding tokens in the context window we can probably do other things that would be more useful such as #54 and #7

@fmaclen
Copy link
Owner

fmaclen commented Jul 22, 2024

I pushed a quick-and-dirty implementation using the /api/chat endpoint:
https://api-chat-endpoint.hollama.pages.dev/

Here's are some early findings:

  • Appears to avoid the parsing error in long sessions because it has a rolling context window that ignores earlier messages.
  • The system prompt appears to not be ignored in long sessions 👍 .
  • This endpoint doesn't return the tokens used as a number[] but instead it returns the total number of tokens as prompt_eval_count, eval_count, total_duration. We should still be able to implement Show session token count #7 and Show response tokens per second rate #8 from these values.

I'm sort of on the fence on this one, it's nice to be able to trust Ollama to always give us valid JSON responses, but not knowing when earlier messages stop being part of the current context feels like a downgrade.

I feel like I'd rather see a "nicer" version of the SyntaxError that prompts the user to "Summarize the session" (#54) and use that as the new context for the current session (or a new one).

@binarynoise
Copy link
Contributor Author

binarynoise commented Jul 26, 2024

The chat version works great – until you reference something from three pages ago.
I'm wondering if there isn't any way to for either variant to progressively condense the most important stuff into the context instead of keeping everything. I have the feeling that summarizing and starting over could be destructive to the flow of the conversation.

@fmaclen
Copy link
Owner

fmaclen commented Jul 27, 2024

I'm not entirely sure what would be the best way to conserve the overall context as much as possible.

But the recent updates to Ollama (tool and web scraping) are only available on the /api/chat endpoint so it's probably worth upgrading to it, close this issue and figure out how the context window issue separately.

@binarynoise Do you feel like you often exceed the context window limit in normal use? or only occasionally?

@fmaclen
Copy link
Owner

fmaclen commented Jul 29, 2024

🎉 This issue has been resolved in version 0.7.8 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working released
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants