Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add clutser topology awareness #638

Merged
merged 15 commits into from
Sep 25, 2024

Conversation

proost
Copy link
Contributor

@proost proost commented Sep 17, 2024

discussion: #637

cluster.go Outdated Show resolved Hide resolved
cluster.go Outdated Show resolved Hide resolved
cluster.go Outdated
Comment on lines 403 to 422
c.mu.RLock()
// check if the new topology is different from the current one
for addr, cc := range conns {
old, ok := c.conns[addr]
if !ok || old.replica != cc.replica {
isChanged = true
break
}
}

// check if the current topology is different from the new one
if !isChanged {
for addr := range c.conns {
if _, ok := conns[addr]; !ok {
isChanged = true
break
}
}
}
c.mu.RUnlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put this check in the lazyRefresh? Then we can simply call lazyRefresh periodically here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. this check logic is for refresh only topology is changed. if you want to make a new method, i added it. 027671b

Copy link
Collaborator

@rueian rueian Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new function is good but not necessary from my point of view.

The problem is that both conditionalRefresh and lazyRefresh use getClusterTopology, which sends CLUSTER SHARDS and is costly.

And conditionalRefresh may call lazyRefresh again if the topology changes, causing another round of CLUSTER SHARDS to be sent.

Can we avoid sending CLUSTER SHARDS doubly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, how about this?

b075e69

@proost proost requested a review from rueian September 18, 2024 11:17
cluster.go Outdated
case <-c.stopCh:
return
case <-ticker.C:
c.conditionalRefresh()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now there is no duplicated getClusterTopology at the end. However, I still want all the refresh processes, including the new conditional one, to be gated by the lazyRefresh function to ensure at most one getClusterTopology per second and not to flood the servers. lazyRefresh serves the same purpose as your initial cool-down period proposal. Can we merge the _refresh and conditionalRefresh and then just use lazyRefresh here?

Suggested change
c.conditionalRefresh()
c.lazyRefresh()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

826de2c

just call lazyRefresh

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am ok with this one, but why do you delete your detection logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, i thought you want to use original one. i changed 9fd1da9

Copy link
Contributor Author

@proost proost Sep 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fa0b58a

We should not merge _refresh and conditionalRefresh.

_refresh is called,

  1. In a constructor. This case must init all conns, pslots, rslots. so should be all logic must be called.
  2. there is no connection for slot. This case should refresh.
  3. When got error from redis cluster. I think this case forced refreshness is reasonable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would 1 be an issue if we made _refresh conditionally? I thought in that case the checking logic would be bypassed.

For 2 and 3, I don't get your points of not making _refresh conditionally. They are not related to the refresh process but are more related to how we act on the result of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@proost proost requested a review from rueian September 20, 2024 14:26
rueidis.go Outdated Show resolved Hide resolved
@proost proost requested a review from rueian September 21, 2024 03:38
cluster.go Outdated Show resolved Hide resolved
cluster.go Outdated Show resolved Hide resolved
@proost proost requested a review from rueian September 22, 2024 14:48
cluster.go Outdated
Comment on lines 224 to 232
// check if cluster client is initialized.
if !shouldRefresh {
for _, cc := range c.pslots {
if cc == nil {
shouldRefresh = true
break
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check really necessary? Also, it is valid that a living cluster has some missing slots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes. you are right.
9f886c5

cluster.go Outdated Show resolved Hide resolved
@proost proost requested a review from rueian September 23, 2024 15:43
cluster.go Outdated
conns[addr] = connrole{
conn: cc.conn,
replica: fresh.replica,
if !forceRefresh {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are almost done. But is this forceRefresh flag necessary? What will happen if we don't use the flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need. Because if conns is same as c.conns after init() called, it doesn't update pslots and rslots in _refresh. we need forced refresh in _refresh

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I overlooked that. However, that shows a problem with the current checking logic: slot changes are not considered. We need to consider them as well, otherwise, we will miss them.

To make the comparison easy, I think we may need to keep track of some simple hashes from the parseSlots and parseShards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slot changes are not considered.

Yes, I missed that point. how about just refresh always? just refreshing is more simpler and at this point, i don't think we can get many advantages using conditional refresh.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we can find a way to do that later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@proost proost requested a review from rueian September 24, 2024 15:01
rueidis.go Outdated Show resolved Hide resolved
@proost proost requested a review from rueian September 25, 2024 14:58
@rueian rueian merged commit 9ff2841 into redis:main Sep 25, 2024
27 checks passed
@rueian
Copy link
Collaborator

rueian commented Sep 25, 2024

@proost merged. Thanks a lot!

@proost proost deleted the feat-cluster-topology-awarness branch September 26, 2024 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants