Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support connection lifetime for single client #727

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

terut
Copy link

@terut terut commented Jan 28, 2025

Background

Recently I noticed that request is unbalanced when its replica failover on memorystore for redis of GCP if the connection keeps. So I consider about connection lifetime to reconnect to redis endpoint because existing connection are not rerouted when a node reintroduced.

Here is the document about archtecture and connection balance manegement.

https://cloud.google.com/memorystore/docs/redis/about-read-replicas#architecture
https://cloud.google.com/memorystore/docs/redis/about-read-replicas#connection_balance_management

Ref: #725

Solution

Support connection lifetime for single client to reconnect fixed read endpoint.

@terut terut marked this pull request as draft January 28, 2025 12:22
@terut
Copy link
Author

terut commented Jan 28, 2025

@rueian Here is draft. Cloud you check your additional points on the discussion. There is no additional tests yet.

  • Use p.lifeTm = time.AfterFunc(...) instead, because we need more fine-grained control over the timer.
  • Stop p.lifeTm early if there is a network error or the connection is closed manually.
  • For dedicated and blocking usages, we should stop p.lifeTm after acquiring the connection from the dpool or spool and reset it when putting the connection back.

pipe.go Outdated
@@ -1576,6 +1587,7 @@ func (p *pipe) Close() {
}
atomic.AddInt32(&p.waits, -1)
atomic.AddInt32(&p.blcksig, -1)
p.StopTimer()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I will fix it...

Copy link
Collaborator

@rueian rueian Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be removed.

pool.go Outdated
@@ -59,6 +59,7 @@ retry:
}
}
p.cond.L.Unlock()
v.StopTimer()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the timer is not stopped successfully, we need to acquire another connection.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's right. Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 390e19b

pipe.go Outdated
@@ -89,6 +91,10 @@ type pipe struct {
recvs int32
r2ps bool // identify this pipe is used for resp2 pubsub or not
noNoDelay bool
lftm time.Duration // lifetime
lftmMu sync.Mutex // guards lifetime timer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the mutex and the bool flag?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed again, we don't need bool flag.
I thought that time.Reset and time.Stop need mutex when using <= go 1.22 . Maybe I've got it wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@terut terut Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, you're right. it looks like thread-safe. I will remove it.

I misread This cannot be done concurrent to other receives from the Timer's channel or other calls to the Timer's Stop method. of https://pkg.go.dev/[email protected]#Timer.Stop . Sorry.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we are using the AfterFunc timer which has no channel associated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed f950c1e

@terut terut force-pushed the feat/conn-lifetime branch from 31ffe91 to 88c8d7e Compare February 8, 2025 11:10
@terut
Copy link
Author

terut commented Feb 8, 2025

The rest is the implementation about retrying on singleclient.
I feel like that maybe it's enough to use ConnLifetime option with enabling retry handler. What do you think about retrying by force for errConnExpired ? @rueian

pipe.go Outdated
@@ -1576,6 +1587,7 @@ func (p *pipe) Close() {
}
atomic.AddInt32(&p.waits, -1)
atomic.AddInt32(&p.blcksig, -1)
p.StopTimer()
Copy link
Collaborator

@rueian rueian Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be removed.

pipe.go Outdated Show resolved Hide resolved
@rueian
Copy link
Collaborator

rueian commented Feb 8, 2025

The rest is the implementation about retrying on singleclient. I feel like that maybe it's enough to use ConnLifetime option with enabling retry handler. What do you think about retrying by force for errConnExpired ? @rueian

I think we should use your original proposal and nothing to do with the retry handler.

retry:
	resp = c.conn.Do(ctx, cmd)
	if resp.Error() == errConnExpired {
		goto retry
	}
	if c.retry && cmd.IsReadOnly() && c.isRetryable(resp.Error(), ctx) {
		...

Because whenever an errConnExpired occurs, we know the connection is closed by ourselves, so it should be safe to retry immediately.

@terut
Copy link
Author

terut commented Feb 9, 2025

@rueian Thanks. Surely we know the error and it's not good to show errConnExpired to outside when disabling retry too. Retry logic is almost done, just need to add that tests.

client.go Outdated
@@ -86,6 +90,13 @@ func (c *singleClient) DoMulti(ctx context.Context, multi ...Completed) (resps [
attempts := 1
retry:
resps = c.conn.DoMulti(ctx, multi...).s
if c.hasConnLftm {
for _, resp := range resps {
if resp.Error() == errConnExpired {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that errConnExpired happens in the middle of DoMulti? I am not sure, but If it is possible then we should not retry preceding requests that don't receive the error.

Copy link
Author

@terut terut Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think it's unlikely. Surely all responses have same error when changing p.state.

I will change that like the following.

if resps[0].Error() == errConnExpired {
  goto retry
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed c0c3657

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, could you leave a comment in the code to explain why it won't happen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants