-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENHANCEMENTS] Support Connection: Keep-Alive / Transfer-Encoding: chunked #235
Comments
Hey @MarcoWel The SetupZoraxy default site listening to :80 (http default port) pointing to a test backend server locally on 8088 Backend ServerThis is the server that once connected with TransferEncoding header, it will start responding with "chunks" for 10 seconds. Modified from here. package main
import (
"fmt"
"io"
"log"
"net/http"
"time"
)
const (
backendResponse = "I am the backend"
backendStatus = 404
)
func main() {
server := &http.Server{
Addr: ":8088",
Handler: http.HandlerFunc(backendHandler),
}
// Run the server in a goroutine
log.Println("Starting server on :8088")
if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
log.Fatalf("Server failed to start: %v", err)
}
}
func HandlePost(w http.ResponseWriter, r *http.Request) {
// #1 add flusher
flusher, ok := w.(http.Flusher)
if !ok {
panic("expected http.ResponseWriter to be an http.Flusher")
}
w.Header().Set("Connection", "Keep-Alive")
w.Header().Set("Transfer-Encoding", "chunked")
w.Header().Set("X-Content-Type-Options", "nosniff")
ticker := time.NewTicker(time.Millisecond * 500)
go func() {
for t := range ticker.C {
// #2 add '\n'
io.WriteString(w, "Chunk\n")
fmt.Println("Tick at", t)
if flusher == nil {
break
}
flusher.Flush()
}
}()
time.Sleep(time.Second * 10)
ticker.Stop()
}
func backendHandler(w http.ResponseWriter, r *http.Request) {
if len(r.TransferEncoding) > 0 {
log.Printf("backend got TransferEncoding: %v", r.TransferEncoding)
HandlePost(w, r)
return
} else {
log.Println("No transfer encoding received")
}
if r.Header.Get("X-Forwarded-For") == "" {
log.Println("didn't get X-Forwarded-For header")
}
if g, e := r.Host, "ce.localhost"; g != e {
log.Printf("backend got Host header %q, want %q", g, e)
}
w.Header().Set("X-Foo", "bar")
http.SetCookie(w, &http.Cookie{Name: "flavor", Value: "chocolateChip"})
w.WriteHeader(backendStatus)
w.Write([]byte(backendResponse))
} TestThe test case create a POST request to Zoraxy listening port and let Zoraxy forward the request to the backend server above. func TestChunkedTransfer(t *testing.T) {
// Test chunked encoding request
chunkedReq, _ := http.NewRequest("POST", backendURL, bytes.NewBufferString(""))
chunkedReq.Host = "localhost"
chunkedReq.TransferEncoding = []string{"chunked"}
chunkedRes, err := http.DefaultClient.Do(chunkedReq)
if err != nil {
t.Fatalf("Chunked POST: %v", err)
}
if g, e := chunkedRes.StatusCode, 200; g != e {
t.Errorf("got chunkedRes.StatusCode %d; expected %d", g, e)
}
// Read the response body in chunks and print to STDOUT
buf := make([]byte, 1024)
for {
n, err := chunkedRes.Body.Read(buf)
if n > 0 {
// Print the chunk to STDOUT
fmt.Print(string(buf[:n]))
}
if err != nil {
if err != io.EOF {
t.Fatalf("Error reading response body: %v", err)
}
break
}
}
chunkedRes.Body.Close()
} Results
So I guess your issue is caused by some other reasons. A side note, the default Zoraxy proxy transporter have the following settings. You can checkout //Hack the default transporter to handle more connections
thisTransporter := http.DefaultTransport
optimalConcurrentConnection := 32
thisTransporter.(*http.Transport).MaxIdleConns = optimalConcurrentConnection * 2
thisTransporter.(*http.Transport).MaxIdleConnsPerHost = optimalConcurrentConnection <-- this
thisTransporter.(*http.Transport).IdleConnTimeout = 30 * time.Second <-- this
thisTransporter.(*http.Transport).MaxConnsPerHost = optimalConcurrentConnection * 2
thisTransporter.(*http.Transport).DisableCompression = true These settings are optimized for high concurrency reverse proxy. I guess these two settings (see the arrows above) should be most related to your issues. If you want to try things out, you can change the values here to see if works better on your server. |
Hi @tobychui, All I can say is that I have investigated the network traffic and found that with Zoraxy, there is no Even injecting At the moment we just switched back to Caddy and deploy manually via Caddyfile. |
Hi @MarcoWel, Can I have your working Caddyfile for the OpenWeb-UI? Maybe I can check Caddy source code and try to figure out what might be the issue here. |
Sure! In this quick-and-dirty example config:
# Frontend
owui.mydomain.com {
reverse_proxy localhost:8880
tls /etc/caddy/owui.mydomain.com.crt /etc/caddy/owui.mydomain.com.key
}
# Backend (allow http and https)
http://ollama.mydomain.com {
reverse_proxy 192.168.0.123:9090
}
https://ollama.mydomain.com {
reverse_proxy 192.168.0.123:9090
tls /etc/caddy/ollama.mydomain.com.crt /etc/caddy/ollama.mydomain.com.key
} |
Ok, I think I have got this issue fixed. I have added an automatic sniffing for llm output and keep-alive header // Fixed issue #235: Added auto detection for ollama / llm output stream
connectionHeader := req.Header["Connection"]
if len(connectionHeader) > 0 && strings.Contains(strings.Join(connectionHeader, ","), "keep-alive") {
return -1
} Now, the speed of the llm generated text showing on the web ui is identical between zoraxy proxied UI and direct connection UI. If you want to check it out now, you can build from source using the nightly v3.0.8 branch (remember to include the |
Is your feature request related to a problem? Please describe.
zoraxy does currently not seem to support connections with request header Connection: Keep-Alive / response header Transfer-Encoding: chunked.
This leads to stuttering output in apps like OpenWebUI talking to Ollama via zoraxy.
Describe the solution you'd like
Support streaming connections.
Describe alternatives you've considered
Ditching zoraxy for another reverse proxy.
Additional context
None
The text was updated successfully, but these errors were encountered: