-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add clutser topology awareness #638
Conversation
cluster.go
Outdated
c.mu.RLock() | ||
// check if the new topology is different from the current one | ||
for addr, cc := range conns { | ||
old, ok := c.conns[addr] | ||
if !ok || old.replica != cc.replica { | ||
isChanged = true | ||
break | ||
} | ||
} | ||
|
||
// check if the current topology is different from the new one | ||
if !isChanged { | ||
for addr := range c.conns { | ||
if _, ok := conns[addr]; !ok { | ||
isChanged = true | ||
break | ||
} | ||
} | ||
} | ||
c.mu.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put this check in the lazyRefresh? Then we can simply call lazyRefresh periodically here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. this check logic is for refresh only topology is changed. if you want to make a new method, i added it. 027671b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new function is good but not necessary from my point of view.
The problem is that both conditionalRefresh
and lazyRefresh
use getClusterTopology
, which sends CLUSTER SHARDS
and is costly.
And conditionalRefresh
may call lazyRefresh
again if the topology changes, causing another round of CLUSTER SHARDS
to be sent.
Can we avoid sending CLUSTER SHARDS
doubly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, how about this?
cluster.go
Outdated
case <-c.stopCh: | ||
return | ||
case <-ticker.C: | ||
c.conditionalRefresh() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now there is no duplicated getClusterTopology
at the end. However, I still want all the refresh processes, including the new conditional one, to be gated by the lazyRefresh
function to ensure at most one getClusterTopology
per second and not to flood the servers. lazyRefresh
serves the same purpose as your initial cool-down period proposal. Can we merge the _refresh
and conditionalRefresh
and then just use lazyRefresh
here?
c.conditionalRefresh() | |
c.lazyRefresh() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just call lazyRefresh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am ok with this one, but why do you delete your detection logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, i thought you want to use original one. i changed 9fd1da9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not merge _refresh
and conditionalRefresh
.
_refresh
is called,
- In a constructor. This case must init all conns, pslots, rslots. so should be all logic must be called.
- there is no connection for slot. This case should refresh.
- When got error from redis cluster. I think this case forced refreshness is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would 1 be an issue if we made _refresh
conditionally? I thought in that case the checking logic would be bypassed.
For 2 and 3, I don't get your points of not making _refresh
conditionally. They are not related to the refresh process but are more related to how we act on the result of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cluster.go
Outdated
// check if cluster client is initialized. | ||
if !shouldRefresh { | ||
for _, cc := range c.pslots { | ||
if cc == nil { | ||
shouldRefresh = true | ||
break | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check really necessary? Also, it is valid that a living cluster has some missing slots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes. you are right.
9f886c5
cluster.go
Outdated
conns[addr] = connrole{ | ||
conn: cc.conn, | ||
replica: fresh.replica, | ||
if !forceRefresh { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are almost done. But is this forceRefresh
flag necessary? What will happen if we don't use the flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need. Because if conns
is same as c.conns
after init()
called, it doesn't update pslots
and rslots
in _refresh
. we need forced refresh in _refresh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. I overlooked that. However, that shows a problem with the current checking logic: slot changes are not considered. We need to consider them as well, otherwise, we will miss them.
To make the comparison easy, I think we may need to keep track of some simple hashes from the parseSlots
and parseShards
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slot changes are not considered.
Yes, I missed that point. how about just refresh always? just refreshing is more simpler and at this point, i don't think we can get many advantages using conditional refresh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we can find a way to do that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@proost merged. Thanks a lot! |
discussion: #637