Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch times out on low end devices serving OLLAMA #612

Open
1 task
ManuXD32 opened this issue Jul 31, 2024 · 10 comments
Open
1 task

Fetch times out on low end devices serving OLLAMA #612

ManuXD32 opened this issue Jul 31, 2024 · 10 comments
Labels
feature-cool Distinctive features type: bug Something isn't working

Comments

@ManuXD32
Copy link

ManuXD32 commented Jul 31, 2024

Description

big-AGI uses fetch to retrieve API responses, the timeout is around 5 minutes so when low end devices like a phone serves ollama and has to process a big context, it just times out.

Device and browser

Android Redmagic 9 pro serving ollama and big-AGI, browser: Brave

Screenshots and more

No response

Willingness to Contribute

  • 🙋‍♂️ Yes, I would like to contribute a fix.
@ManuXD32 ManuXD32 added the type: bug Something isn't working label Jul 31, 2024
@enricoros
Copy link
Owner

Thanks @ManuXD32 - we have a fully new networking stack for Big Agi 2 which in principles would allow to send ping packets to keep the connection alive. This may help, we'd need to test it.
Explain more: is this for a streaming text generation request, and is it the initial fetch operation? Where is ollama running, and where are big-agi server and client running?

@ManuXD32
Copy link
Author

Thanks @ManuXD32 - we have a fully new networking stack for Big Agi 2 which in principles would allow to send ping packets to keep the connection alive. This may help, we'd need to test it. Explain more: is this for a streaming text generation request, and is it the initial fetch operation? Where is ollama running, and where are big-agi server and client running?

Ollama and big-AGI (server and client) are running on the phone.
It is for streaming text generation and it happens with the firs fetch operation if the context is very large and within the conversation if it becomes too long (it takes longer to answer because the model needs to process more tokens).

Thank you so much for your answer and work, I'm really loving it and truly enjoy running it on my server :)

@enricoros
Copy link
Owner

@ManuXD32 Oh wow thanks for the answer. How did you manage to run big-AGI fully on the phone? I want to try that out.
For the timeout, I believe that with sending pings I could solve it.

@ManuXD32
Copy link
Author

ManuXD32 commented Aug 1, 2024

@ManuXD32 Oh wow thanks for the answer. How did you manage to run big-AGI fully on the phone? I want to try that out. For the timeout, I believe that with sending pings I could solve it.

Cool, I think ping is a nice approach.

Here is a little guide on how to install it on android. Honestly it is so straight forward that I thought it was supported and I just didn't know it hahahah.

I used proot-distro and termux:

  1. Install proot-distro:
    apt update && apt upgrade -y && apt install proot-distro -y
    2. Install the ubuntu distro:
    pd install --override-alias agi ubuntu
    3. Enter the distro:
    pd login agi
  2. Install packages and clone the repo:
    apt update && apt upgrade -y && apt install nodejs -y
    - use git clone to clone the repo and navigate there with cd
  3. Follow the setup guide
    npm install
    npm run build
    npx next start --port 3000

@enricoros enricoros added feature-cool Distinctive features and removed requested-info labels Aug 1, 2024
@michieal
Copy link

@enricoros Hi there! this seems to be an issue when running on local hardware - ie., not just phones. Maybe adding in a setting to change the timeouts?
I've experienced this issue on both LocalAI and Ollama, when the conversation history becomes larger (about 4k tokens) my Ryzen 7 (5900) takes too long to process the context and input. Neither ever gets close to the context window of 16k tokens.
[Service Issue] Ollama: fetch failed · {"name":"HeadersTimeoutError","code":"UND_ERR_HEADERS_TIMEOUT","message":"Headers Timeout Error"} [DEV_URL: http://127.0.0.1:11434/api/chat] is the error message that I receive, but looking at System Monitor (Ubuntu/Plasma) I can tell that both Ollama and LocalAI are still processing the request.
Now, if I delete messages out of the conversation history in the chat, it works just fine.

This tells me, as a fellow developer (Appreciate your work, btw!) that the interface (Big-AGI) is timing out and reporting an error before the AI programs are finished processing the input stream.

Additional info: I have had this issue with v1 Dev, v1 Stable, and v2 Dev.

@enricoros
Copy link
Owner

Hi @michieal was this with the latest v2-dev branch?

It's the backend part of big-AGI (nodejs) timing out after 5 minutes of not receiving the headers of the http request. This happens in the deep of the network library (undici) of nodejs: nodejs/node#46375

Some people say it can be fixed other don't. I'd welcome a patch that tries to raise the network timeouts of the upstream fetch operation (src/modules/aix library, search for the "fetch(").

@michieal
Copy link

Hi @michieal was this with the latest v2-dev branch?

The News page says that it's version "Big-AGI has been updated to version 1.16.8" though, when I used git to grab it, I swear that I was grabbing the V2 Dev branch.

It's the backend part of big-AGI (nodejs) timing out after 5 minutes of not receiving the headers of the http request. This happens in the deep of the network library (undici) of nodejs: nodejs/node#46375

Some people say it can be fixed other don't. I'd welcome a patch that tries to raise the network timeouts of the upstream fetch operation (src/modules/aix library, search for the "fetch(").

I would love to be able to create a patch for this... but, it would be me blindly following what the AI said. I'm more of a desktop application / game development guy. I'll look at the code, but... I am probably gonna look like a monkey scratching his head. 🤣

@michieal
Copy link

It looks like that fetch could be changed... Since that has a hard coded timeout, maybe change it to use something that has the ability to specify a time out. I saw Abort Signal (instead of Abort Control) in the referenced issues, and when I asked the Deepseek coder ai how to fix it, it kept telling me to use Abort Control to manage the timeout. Still, it's pretty greek to me, so I don't know how to fix it... hopefully you will know what to do with this information.

@enricoros
Copy link
Owner

@michieal I took a quick look (trpc.router.fetchers.ts line 74 "response = await fetch(url, request);")

This is a quick patch I put together:

Index: src/server/trpc/trpc.router.fetchers.ts
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/server/trpc/trpc.router.fetchers.ts b/src/server/trpc/trpc.router.fetchers.ts
--- a/src/server/trpc/trpc.router.fetchers.ts	(revision 79c71a174088449776fef140d6841a444bca80dc)
+++ b/src/server/trpc/trpc.router.fetchers.ts	(date 1734598201566)
@@ -2,6 +2,7 @@
 
 import { debugGenerateCurlCommand, safeErrorString, SERVER_DEBUG_WIRE } from '~/server/wire';
 
+import { Agent as UndiciAgent, fetch, RequestInit, Response } from 'undici';
 
 //
 // NOTE: This file is used in the server-side code, and not in the client-side code.
@@ -13,7 +14,7 @@
 
 // JSON fetcher
 export async function fetchJsonOrTRPCThrow<TOut extends object = object, TBody extends object | undefined = undefined>(config: RequestConfig<TBody>): Promise<TOut> {
-  return _fetchFromTRPC<TBody, TOut>(config, async (response) => await response.json(), 'json');
+  return _fetchFromTRPC<TBody, TOut>(config, async (response) => await response.json() as Promise<TOut>, 'json');
 }
 
 // Text fetcher
@@ -68,6 +69,11 @@
       headers: headers !== undefined ? headers : undefined,
       body: body !== undefined ? JSON.stringify(body) : undefined,
       signal: signal !== undefined ? signal : undefined,
+      dispatcher: new UndiciAgent({
+        connectTimeout: 15 * 60 * 1000,   // 15 min
+        headersTimeout: 15 * 60 * 1000,   // 15 min
+        bodyTimeout: 15 * 60 * 1000,      // 15 min
+      }),
     };
 
     // upstream fetch

Unfortunately I don't have time to test this thoroughly for now (as it impacts 15 model providers and more functions, this is core logic). The patch could also break in some other type conversations.

Leaving this here for reference, but for now I think we'll need to tolerate the 5 min timeout unless someone else wants to take a stab at this.

@michieal
Copy link

michieal commented Dec 20, 2024

@enricoros How do I install the support for Undici? when I made the code changes, it gave me an error that it wasn't found... I figured that the least that I could do was to try this out and test it for Ollama and LocalAI for you.

@ everyone - if you make a pr for this, tag me in the comments, so that I can help test it. TIA!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-cool Distinctive features type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants