feat: support connection lifetime for single client #727

terut · 2025-01-28T12:22:19Z

Background

Recently I noticed that request is unbalanced when its replica failover on memorystore for redis of GCP if the connection keeps. So I consider about connection lifetime to reconnect to redis endpoint because existing connection are not rerouted when a node reintroduced.

Here is the document about archtecture and connection balance manegement.

https://cloud.google.com/memorystore/docs/redis/about-read-replicas#architecture
https://cloud.google.com/memorystore/docs/redis/about-read-replicas#connection_balance_management

Ref: #725

Solution

Support connection lifetime for single client to reconnect fixed read endpoint.

terut · 2025-01-28T12:29:04Z

@rueian Here is draft. Cloud you check your additional points on the discussion. There is no additional tests yet.

Use p.lifeTm = time.AfterFunc(...) instead, because we need more fine-grained control over the timer.
Stop p.lifeTm early if there is a network error or the connection is closed manually.
For dedicated and blocking usages, we should stop p.lifeTm after acquiring the connection from the dpool or spool and reset it when putting the connection back.

terut · 2025-01-28T12:33:16Z

pipe.go

@@ -1576,6 +1587,7 @@ func (p *pipe) Close() {
 	}
 	atomic.AddInt32(&p.waits, -1)
 	atomic.AddInt32(&p.blcksig, -1)
+	p.StopTimer()


Oops. I will fix it...

this should be removed.

rueian · 2025-01-28T16:46:59Z

pool.go

@@ -59,6 +59,7 @@ retry:
 		}
 	}
 	p.cond.L.Unlock()
+	v.StopTimer()


If the timer is not stopped successfully, we need to acquire another connection.

Ah, that's right. Thanks!

Fixed 390e19b

rueian · 2025-01-28T16:48:15Z

pipe.go

@@ -89,6 +91,10 @@ type pipe struct {
 	recvs           int32
 	r2ps            bool // identify this pipe is used for resp2 pubsub or not
 	noNoDelay       bool
+	lftm            time.Duration // lifetime
+	lftmMu          sync.Mutex    // guards lifetime timer


Do we really need the mutex and the bool flag?

I reviewed again, we don't need bool flag.
I thought that time.Reset and time.Stop need mutex when using <= go 1.22 . Maybe I've got it wrong.

The source looks like it is thread-safe https://cs.opensource.google/go/go/+/refs/tags/go1.22.0:src/runtime/time.go;l=314.

Thanks, you're right. it looks like thread-safe. I will remove it.

I misread This cannot be done concurrent to other receives from the Timer's channel or other calls to the Timer's Stop method. of https://pkg.go.dev/[email protected]#Timer.Stop . Sorry.

And we are using the AfterFunc timer which has no channel associated.

That's right.

Fixed f950c1e

terut · 2025-02-08T14:19:46Z

The rest is the implementation about retrying on singleclient.
I feel like that maybe it's enough to use ConnLifetime option with enabling retry handler. What do you think about retrying by force for errConnExpired ? @rueian

rueian · 2025-02-08T16:33:06Z

pipe.go

@@ -1576,6 +1587,7 @@ func (p *pipe) Close() {
 	}
 	atomic.AddInt32(&p.waits, -1)
 	atomic.AddInt32(&p.blcksig, -1)
+	p.StopTimer()


this should be removed.

pipe.go

rueian · 2025-02-08T16:58:38Z

The rest is the implementation about retrying on singleclient. I feel like that maybe it's enough to use ConnLifetime option with enabling retry handler. What do you think about retrying by force for errConnExpired ? @rueian

I think we should use your original proposal and nothing to do with the retry handler.

retry:
	resp = c.conn.Do(ctx, cmd)
	if resp.Error() == errConnExpired {
		goto retry
	}
	if c.retry && cmd.IsReadOnly() && c.isRetryable(resp.Error(), ctx) {
		...

Because whenever an errConnExpired occurs, we know the connection is closed by ourselves, so it should be safe to retry immediately.

terut · 2025-02-09T00:30:41Z

@rueian Thanks. Surely we know the error and it's not good to show errConnExpired to outside when disabling retry too. Retry logic is almost done, just need to add that tests.

Co-authored-by: Rueian <[email protected]>

rueian · 2025-02-10T07:38:13Z

client.go

@@ -86,6 +90,13 @@ func (c *singleClient) DoMulti(ctx context.Context, multi ...Completed) (resps [
 	attempts := 1
 retry:
 	resps = c.conn.DoMulti(ctx, multi...).s
+	if c.hasConnLftm {
+		for _, resp := range resps {
+			if resp.Error() == errConnExpired {


Is it possible that errConnExpired happens in the middle of DoMulti? I am not sure, but If it is possible then we should not retry preceding requests that don't receive the error.

Ah, I think it's unlikely. Surely all responses have same error when changing p.state.

I will change that like the following.

if resps[0].Error() == errConnExpired { goto retry }

Fixed c0c3657

ok, could you leave a comment in the code to explain why it won't happen?

feat: add connection lifetime option to single client

f18e613

terut marked this pull request as draft January 28, 2025 12:22

terut commented Jan 28, 2025

View reviewed changes

rueian reviewed Jan 28, 2025

View reviewed changes

terut added 3 commits February 8, 2025 18:40

Remove mutex and timer flag for connection lifetime timer

f950c1e

Retry wire accquition when failed to stop connection lifetime timer

390e19b

Add timer test to pipe

88c8d7e

terut force-pushed the feat/conn-lifetime branch from 31ffe91 to 88c8d7e Compare February 8, 2025 11:10

Add test for reseting timer and stopping timer when using pool

14349d4

rueian reviewed Feb 8, 2025

View reviewed changes

terut and others added 2 commits February 10, 2025 14:55

Remove p.StopTimer() from p.Close()

e91d316

Co-authored-by: Rueian <[email protected]>

Forced to retry when errConnExpired

9fba892

rueian reviewed Feb 10, 2025

View reviewed changes

Remove hasConnLftm and check resps[0] to retry for multi cmds

c0c3657

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support connection lifetime for single client #727

feat: support connection lifetime for single client #727

terut commented Jan 28, 2025

terut commented Jan 28, 2025

terut Jan 28, 2025

rueian Feb 8, 2025 •

edited

Loading

rueian Jan 28, 2025

terut Jan 29, 2025

terut Feb 8, 2025

rueian Jan 28, 2025

terut Jan 29, 2025

rueian Jan 29, 2025

terut Jan 30, 2025 •

edited

Loading

rueian Jan 30, 2025

terut Jan 30, 2025

terut Feb 8, 2025

terut commented Feb 8, 2025

rueian Feb 8, 2025 •

edited

Loading

rueian commented Feb 8, 2025

terut commented Feb 9, 2025

rueian Feb 10, 2025

terut Feb 10, 2025 •

edited

Loading

terut Feb 10, 2025

rueian Feb 10, 2025

feat: support connection lifetime for single client #727

Are you sure you want to change the base?

feat: support connection lifetime for single client #727

Conversation

terut commented Jan 28, 2025

Background

Solution

terut commented Jan 28, 2025

Choose a reason for hiding this comment

rueian Feb 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

terut Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

terut commented Feb 8, 2025

rueian Feb 8, 2025 • edited Loading

Choose a reason for hiding this comment

rueian commented Feb 8, 2025

terut commented Feb 9, 2025

Choose a reason for hiding this comment

terut Feb 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rueian Feb 8, 2025 •

edited

Loading

terut Jan 30, 2025 •

edited

Loading

rueian Feb 8, 2025 •

edited

Loading

terut Feb 10, 2025 •

edited

Loading