Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENTS] Support Connection: Keep-Alive / Transfer-Encoding: chunked #235

Closed
MarcoWel opened this issue Jul 9, 2024 · 5 comments
Closed
Labels
enhancement New feature or request

Comments

@MarcoWel
Copy link

MarcoWel commented Jul 9, 2024

Is your feature request related to a problem? Please describe.
zoraxy does currently not seem to support connections with request header Connection: Keep-Alive / response header Transfer-Encoding: chunked.

This leads to stuttering output in apps like OpenWebUI talking to Ollama via zoraxy.

Describe the solution you'd like
Support streaming connections.

Describe alternatives you've considered
Ditching zoraxy for another reverse proxy.

Additional context
None

@MarcoWel MarcoWel added the enhancement New feature or request label Jul 9, 2024
@tobychui
Copy link
Owner

Hey @MarcoWel

The Connection: Keep-Alive and Transfer-Encoding: chunked are supported by the default go net/http library and the transporter we used in Zoraxy is the golang build-in one. To validate it is not these header's problem, I have written a test just to prove my point.

Setup

Zoraxy default site listening to :80 (http default port) pointing to a test backend server locally on 8088

Backend Server

This is the server that once connected with TransferEncoding header, it will start responding with "chunks" for 10 seconds. Modified from here.

package main

import (
	"fmt"
	"io"
	"log"
	"net/http"
	"time"
)

const (
	backendResponse = "I am the backend"
	backendStatus   = 404
)

func main() {
	server := &http.Server{
		Addr:    ":8088",
		Handler: http.HandlerFunc(backendHandler),
	}

	// Run the server in a goroutine
	log.Println("Starting server on :8088")
	if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
		log.Fatalf("Server failed to start: %v", err)
	}
}

func HandlePost(w http.ResponseWriter, r *http.Request) {
	// #1 add flusher
	flusher, ok := w.(http.Flusher)
	if !ok {
		panic("expected http.ResponseWriter to be an http.Flusher")
	}
	w.Header().Set("Connection", "Keep-Alive")
	w.Header().Set("Transfer-Encoding", "chunked")
	w.Header().Set("X-Content-Type-Options", "nosniff")

	ticker := time.NewTicker(time.Millisecond * 500)
	go func() {
		for t := range ticker.C {
			// #2 add '\n'
			io.WriteString(w, "Chunk\n")
			fmt.Println("Tick at", t)
			if flusher == nil {
				break
			}
			flusher.Flush()
		}
	}()
	time.Sleep(time.Second * 10)
	ticker.Stop()
}

func backendHandler(w http.ResponseWriter, r *http.Request) {
	if len(r.TransferEncoding) > 0 {
		log.Printf("backend got TransferEncoding: %v", r.TransferEncoding)
		HandlePost(w, r)
		return
	} else {
		log.Println("No transfer encoding received")
	}
	if r.Header.Get("X-Forwarded-For") == "" {
		log.Println("didn't get X-Forwarded-For header")
	}
	if g, e := r.Host, "ce.localhost"; g != e {
		log.Printf("backend got Host header %q, want %q", g, e)
	}
	w.Header().Set("X-Foo", "bar")
	http.SetCookie(w, &http.Cookie{Name: "flavor", Value: "chocolateChip"})
	w.WriteHeader(backendStatus)
	w.Write([]byte(backendResponse))
}

Test

The test case create a POST request to Zoraxy listening port and let Zoraxy forward the request to the backend server above.

func TestChunkedTransfer(t *testing.T) {
	// Test chunked encoding request
	chunkedReq, _ := http.NewRequest("POST", backendURL, bytes.NewBufferString(""))
	chunkedReq.Host = "localhost"
	chunkedReq.TransferEncoding = []string{"chunked"}
	chunkedRes, err := http.DefaultClient.Do(chunkedReq)
	if err != nil {
		t.Fatalf("Chunked POST: %v", err)
	}
	if g, e := chunkedRes.StatusCode, 200; g != e {
		t.Errorf("got chunkedRes.StatusCode %d; expected %d", g, e)
	}
	// Read the response body in chunks and print to STDOUT
	buf := make([]byte, 1024)
	for {
		n, err := chunkedRes.Body.Read(buf)
		if n > 0 {
			// Print the chunk to STDOUT
			fmt.Print(string(buf[:n]))
		}
		if err != nil {
			if err != io.EOF {
				t.Fatalf("Error reading response body: %v", err)
			}
			break
		}
	}
	chunkedRes.Body.Close()
}

Results

Running tool: C:\Program Files\Go\bin\go.exe test -timeout 30s -run ^TestChunkedTransfer$ imuslab.com/zoraxy/tools/chunked-encoding-test

=== RUN   TestChunkedTransfer
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
--- PASS: TestChunkedTransfer (10.01s)

So I guess your issue is caused by some other reasons. A side note, the default Zoraxy proxy transporter have the following settings. You can checkout dpcore.go for more details.

//Hack the default transporter to handle more connections
	thisTransporter := http.DefaultTransport
	optimalConcurrentConnection := 32
	thisTransporter.(*http.Transport).MaxIdleConns = optimalConcurrentConnection * 2
	thisTransporter.(*http.Transport).MaxIdleConnsPerHost = optimalConcurrentConnection <-- this
	thisTransporter.(*http.Transport).IdleConnTimeout = 30 * time.Second <-- this
	thisTransporter.(*http.Transport).MaxConnsPerHost = optimalConcurrentConnection * 2
	thisTransporter.(*http.Transport).DisableCompression = true

These settings are optimized for high concurrency reverse proxy. I guess these two settings (see the arrows above) should be most related to your issues. If you want to try things out, you can change the values here to see if works better on your server.

@MarcoWel
Copy link
Author

Hi @tobychui,
Thank you for the extensive reply.

All I can say is that OpenWeb-UI > Zoraxy > Ollama the text output is stuttering, where OpenWeb-UI > Ollama and OpenWeb-UI > Caddy > Ollama work smoothly out-of-the-box.

I have investigated the network traffic and found that with Zoraxy, there is no Connection: Keep-Alive header in the request from the frontend and accordingly no Transfer-Encoding: chunked in the response from the backend.

Even injecting Connection: Keep-Alive header with Zoraxy did not help.

At the moment we just switched back to Caddy and deploy manually via Caddyfile.

@tobychui
Copy link
Owner

Hi @MarcoWel,

Can I have your working Caddyfile for the OpenWeb-UI? Maybe I can check Caddy source code and try to figure out what might be the issue here.

@MarcoWel
Copy link
Author

MarcoWel commented Jul 11, 2024

Sure!

In this quick-and-dirty example config:

  • Frontend: Runs in docker on same machine 1
  • Reverse-Proxy: Caddy as service on machine 1 (Zoraxy in docker on machine 1)
  • Backend: Runs on different machine 2
# Frontend
owui.mydomain.com {
	reverse_proxy localhost:8880
	tls /etc/caddy/owui.mydomain.com.crt /etc/caddy/owui.mydomain.com.key
}

# Backend (allow http and https)
http://ollama.mydomain.com {
	reverse_proxy 192.168.0.123:9090
}
https://ollama.mydomain.com {
	reverse_proxy 192.168.0.123:9090
	tls /etc/caddy/ollama.mydomain.com.crt /etc/caddy/ollama.mydomain.com.key
}

@tobychui
Copy link
Owner

Ok, I think I have got this issue fixed. I have added an automatic sniffing for llm output and keep-alive header

	// Fixed issue #235: Added auto detection for ollama / llm output stream
	connectionHeader := req.Header["Connection"]
	if len(connectionHeader) > 0 && strings.Contains(strings.Join(connectionHeader, ","), "keep-alive") {
		return -1
	}

Now, the speed of the llm generated text showing on the web ui is identical between zoraxy proxied UI and direct connection UI. If you want to check it out now, you can build from source using the nightly v3.0.8 branch (remember to include the web folder)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants