Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PartitionSelectorException in Lettuce Client Triggered by Addition of New Slave Nodes in Redis Cluster #2769

Open
wmxl opened this issue Feb 25, 2024 · 4 comments
Labels
for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage

Comments

@wmxl
Copy link
Contributor

wmxl commented Feb 25, 2024

Bug Report

Current Behavior

A PartitionSelectorException occurs under specific conditions when a Redis cluster is adding a new slave node. If the Lettuce client's periodic topology refresh coincides with the addition of the new slave node, there's a small chance of encountering the io.lettuce.core.cluster.PartitionSelectorException: Cannot determine a partition to ...

Stack trace
io.lettuce .core .cluster.PartitionSelectorException: Cannot determine a partition for slot 4419.
at io.lettuce.core.cluster.PooledclusterConnectionProvider .getwriteConnection (PooledclusterConnectionProvider . java: 164) ~[lettuce-core-6.2.3.RELEASE. jarl/ :6.2.3.RELEASE]
at io.lettuce.core.cluster. PooledclusterConnectionProvider .getconnect ionAsyne(PooledclusterConnectionProvider. java: 149) -flettuce-core-6.2.3. RELEASE. jar1/ :6.2.3.RBLEASE]
at io. lettuce.core.cluster.clusterDistributionChannelwriter. donrite(ClusterDistributionchannelwriter. java: 171) ~[lettuce-core-6.2.3.RELEASE. jar1/ :6.2.3.RELEASE]
at io. 1ettuce.core.cluster.clusterDistributionChannelNziter.wzitefciusterDistribut ionChannelwriter. java:105) -f1ettuce-core-6.2.3.RBLEASE. jar1/:6.2.3.RELEASE] at io. lettuce.core.RedischannelHandler .dispatch(RedischannelHandler. java:215) ~[lettuce-core-6.2.3. RELEASE. jar1/ :6.2.3. RELEASE]
at Ho. 1ettuce.core.clugter. StatefulReaisciusterconnectionInpl.aispateh(StatefulRedisclusterconnectionImel. Java:216) -flettuce-core-6. 2.3. RELEASE. Jar1 /:6.2.3. RBLEASE]
at io. lettuce.core. AbstractRedisAsyneCommands .dispatch (AbstractRedisAsyncCommands . java: 676) -[lettuce-core-6.2.3. RELEASE. jar1/:6.2.3. RBLEASE]
at io. lettuce.core. AbstractRedisAsyncCommands.del (AbstractRedisAsyncCommands. java: 629) -[lettuce-core-6.2.3. RELEASE. jar1/:6.2.3.RELEASE]
at io. lettuce.core.cluster. RedisAdvancedclusterAsyncCommands Impl.del (RedisAdvancedClusterAsyncCommands Impl. java: 175) ~[lettuce-core-6.2.3. RELEASE. jar!/ :6.2.3.RELEASE]
at io.lettuce.core.cluster. RedisAdvancedClusterAsyncCommands Impl.del (RedisAdvancedclusterAsyncConmands Impl. java: 166) ~[lettuce-core-6.2.3.RELEASE. jar1/:6.2.3.RBLBASE]
at jdk. internal.reflect. GeneratedMethodAccessor591. invoke (Unknown Source) ~[?:?]
0 et idk. internal.refleet.DelegatingMethodhccessorInpl.invoke (DelegatingMethodnccessorInpl. java: 43) ~[?:2] at
java. 1ang.reflect.Method. invoke (Method. java: 566) -[?:3]
at io. lettuce. core. cluster.ClusterFuturesyncInvocationHlandler.handleInvocation(ClusterfuturesyncInvocationHandler. java: 122) -[lettuce-core-6.2.3. RELEASE. jar1/:6.2.3.RELEA
at io.1ettuce.core.internal. AbstractInvocationlandler. invoke(AbstractInvocationHlandler. java: 80) -[lettuce-core-6.2.3.RBLEASE. jar1/:6.2.3.RELEASE]
at com. sun.proxy . $Proxy141.del (Unknown Source) ~[?:2]
at en.techwolf.cache. service. impl. LettuceclusterCacheserviceImpl. del(LettuceclusterCacheservice Impl. java:573)~ltechwolf-cache-client-1.0.129-SNAPSHOL: Jar1/:21 at com. zhipin. service.common.test. RedisClusterLoadrestController.delete (RedisClusterLoadTestController.java:264 zhipin-common-api-imp1-1.0.615-SNAPSHOT . jar! /:21
at com. zhipin. service. common.test. RedisclusterLoadrestController. lambda$mix$2 (RedisclusterLoadrestController.java:104) ~[zhipin-conmon-api-impl-1.0.615-SNAPSHOT . jar 1/:?]
at java. util.concurrent . Bxecutors$RunnableAdapter.cal1 (Executorsjava:515) [?:?]
at java.util.concurrent.Futurerask.run(Futurerask.java: 264) [?:?]
at java.util.concurrent. ThreadPoolExecutor.runWorker (ThreadPoolExecutor. java: 1128) [?:?]
at java.util.concurrent. ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java: 628) [?:?]

Input Code

the code that is responsible for the bug

Input Code
static final class KnownMajority extends PartitionsConsensus {
        @Override
        Partitions getPartitions(Partitions current, Map<RedisURI, Partitions> topologyViews) {
            if (topologyViews.isEmpty()) {
                return current;
            }
            List<VotedPartitions> votedList = new ArrayList<>();
            for (Partitions partitions : topologyViews.values()) {
                int knownNodes = 0;
                for (RedisClusterNode knownNode : current) {
                    if (partitions.getPartitionByNodeId(knownNode.getNodeId()) != null) {
                        knownNodes++;
                    }
                }
                votedList.add(new VotedPartitions(knownNodes, partitions));
            }
            Collections.shuffle(votedList);
            Collections.sort(votedList, (o1, o2) -> Integer.compare(o2.votes, o1.votes));
            return votedList.get(0).partitions;
        }
    }

Expected behavior/code

The expected behavior is for the Lettuce client to handle the topology refresh seamlessly without throwing a PartitionSelectorException, even when a new slave node is being added to the Redis cluster.

Environment

  • Lettuce version(s): 6.2.3.RELEASE
  • Redis version: 6.2.10

Possible Solution

The proposed solution involves modifying the KnownMajority class within PartitionsConsensus. The adjustment aims to account for master nodes with 0 assigned slots, which occur transiently when a new node is added to the cluster. This change ensures that these nodes are not incorrectly voted for during topology consensus, preventing the PartitionSelectorException.

// In PartitionsConsensusImpl.KnownMajority
// Additional logic to handle master nodes with 0 assigned slots during the addition of a new node
for (Partitions partitions : topologyViews.values()) {  
    for (Partitions partitions : topologyViews.values()) {
                int knownNodes = 0;
                for (RedisClusterNode knownNode : current) {
                    if (partitions.getPartitionByNodeId(knownNode.getNodeId()) != null) {
                        knownNodes++;
                        RedisClusterNode partitionByNodeId = partitions.getPartitionByNodeId(knownNode.getNodeId());

                         // master node should not have slots size = 0, this happened when a new node was added to cluster, [cluster nodes] command connect to
                        // the newly added node may return incorrect result, like this: nodeId ip@port master 0 1707379882000 0 disconnected (getSlots().size() == 0)
                        // normal result should be like this: nodeId ip@port master 0 1707379884310 54 connected 5461-10922 (getSlots().size() > 0)
                        // thus we should not vote for this kind of topology views that include master node with 0 assigned slots
                        if (partitionByNodeId.is(RedisClusterNode.NodeFlag.UPSTREAM) && partitionByNodeId.getSlots().size() == 0) {
                            knownNodes--;
                        }
                    }
                }
                votedList.add(new VotedPartitions(knownNodes, partitions));
            }
}

Additional context

The issue arises due to the two-step process of adding a slave node to a Redis cluster, where the node is initially added as a master before its role changes to a slave. If a topology refresh occurs during this period and connects to the newly added node, the returned cluster nodes information may be incorrect. This incorrect information leads to a situation where a master node might be reported with 0 assigned slots, triggering the PartitionSelectorException. The proposed solution in the KnownMajority class's logic aims to prevent this by adjusting the voting mechanism to disregard these transient states.

image

The result in the red box will cause the slot size of the 192.168.21.32:6479(master) in the topology obtained by Lettuce to be empty.(slots size = 0)

How to reproduce the error

  1. Set refreshPeriod to 1 second in the Lettuce client configuration to increase the frequency of topology refreshes.
  2. Utilize two shell scripts to simulate adding a new slave node to the cluster repeatedly:
    a.sh script removes a node from the cluster, stops the Redis server, cleans up data, and then re-adds the node as a slave to a specific master.
    cron.sh script repeatedly executes a.sh to simulate the node addition process multiple times, introducing potential timing conflicts with the topology refresh.
  3. Write a program that continuously executes simple Redis commands through the Lettuce client. As long as the program runs long enough, it will inevitably encounter the PartitionSelectorException.
    My two shell scripts:
    a.sh
#!/usr/bin/env bash

source /etc/profile
myid=$(redis-cli -h 192.168.26.159 -p 6479 -a g4BaiN7u9dMq cluster myid|tail -1)
echo $myid
redis-cli --cluster del-node 192.168.26.159:6479 $myid -a g4BaiN7u9dMq
service redis-cluster stop 6479
rm -f /data/redis-6.2.10-Linux-x86_64/dbdata/6479/dump.rdb /data/redis-6.2.10-Linux-x86_64/conf/nodes.6479
echo 'Start Redis server after 10 sencods'
sleep 10
echo 'Start Redis Server'
service redis-cluster start 6479
echo 'Meet Node after 10 seconds......'
sleep 10
echo 'Start Meet.......'
redis-cli -a g4BaiN7u9dMq  --cluster add-node 192.168.26.159:6479 192.168.26.59:6479 --cluster-slave --cluster-master-id 2ffd17608f6c56fd1ab8cb07679d6a32af8f54b4

cron.sh

#!/usr/bin/env bash

source /etc/profile


for ((i=1; i<=100000; i++))
do

    echo $i
    ./a.sh
    echo "sleep 10 sencods"
    sleep 10
done
@mp911de
Copy link
Collaborator

mp911de commented Feb 28, 2024

PartitionsConsensus isn't designed to sort out replica reconfiguration issues, it is intended to prevent partition splits (split-brain) so that the topology refresh stays with the group of nodes that shall remain in a cluster setup.

For your case, I suggest subclassing RedisClusterClient and overriding determinePartitions with a bit of code that suits your use-case.

@wmxl
Copy link
Contributor Author

wmxl commented Feb 29, 2024

PartitionsConsensus isn't designed to sort out replica reconfiguration issues, it is intended to prevent partition splits (split-brain) so that the topology refresh stays with the group of nodes that shall remain in a cluster setup.

For your case, I suggest subclassing RedisClusterClient and overriding determinePartitions with a bit of code that suits your use-case.

Thank you for your response. I understand that PartitionsConsensus is primarily aimed at avoiding split-brain scenarios. However, adding slave nodes is a quite common operation for Redis clusters, and encountering PartitionSelectorException during this process is possible, even though the chance is quite small. (This happened twice in my company, during the expansion of our Redis cluster from a 1 master & 1 slave setup to a 1 master & 2 slaves configuration; one of our application servers experienced PartitionSelectorException for 15 seconds, which corresponds to our refresh period time.)

Given this, I believe enhancing Lettuce to better handle the addition of nodes during topology refreshes could benefit many users.

Since PartitionsConsensus is aimed at preventing partition splits, if we could add some filter logic in determinePartitions, something like this:

protected Partitions determinePartitions(Partitions current, Map<RedisURI, Partitions> topologyViews) {
        // Filter out invalid topology views where master nodes have 0 slots.
        Map<RedisURI, Partitions> filteredTopologyViews = new HashMap<>();
        for (Map.Entry<RedisURI, Partitions> entry : topologyViews.entrySet()) {
            Partitions partitions = entry.getValue();
            boolean isValid = true;

            for (RedisClusterNode node : partitions) {
                if (node.is(RedisClusterNode.NodeFlag.UPSTREAM) && node.getSlots().isEmpty()) {
                    isValid = false;
                    break;
                }
            }

            if (isValid) {
                filteredTopologyViews.put(entry.getKey(), partitions);
            }
        }
        
         // PartitionsConsensus codes...
    }

In my testing, after implementing such modifications, the PartitionSelectorException never occurred again, even in the midst of refreshing the topology upon adding nodes.

I hope this suggestion might be helpful, and I look forward to your thoughts.

@mp911de
Copy link
Collaborator

mp911de commented Feb 29, 2024

A node can be valid if it doesn't hold any slots, e.g. for Pub/Sub usage. I wondered whether it would make sense to make PartitionConsensus a public API so you could configure something in ClusterClientOptions.

Alternatively, you can enable adaptive topology refresh. Before the exception is thrown, the adaptive trigger UNCOVERED_SLOT is emitted and you can reduce the refresh period without touching periodic refresh.

@wmxl
Copy link
Contributor Author

wmxl commented Feb 29, 2024

A node can be valid if it doesn't hold any slots, e.g. for Pub/Sub usage.

I see.

I wondered whether it would make sense to make PartitionConsensus a public API so you could configure something in ClusterClientOptions.

It would be better to prevent the client from getting incorrect cluster nodes information from the newly added nodes, moreover, it could potentially become the final determined partition.

Alternatively, you can enable adaptive topology refresh. Before the exception is thrown, the adaptive trigger UNCOVERED_SLOT is emitted and you can reduce the refresh period without touching periodic refresh.

We have enabled the adaptive topology refresh by calling enableAllAdaptiveRefreshTriggers(), but it seems it didn't work when the PartitionSelectorException was triggered by periodic refresh. I'm not sure if this is because they share the same timeout setting (we set both to 15s).

@tishun tishun added for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage labels Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage
Projects
None yet
Development

No branches or pull requests

3 participants