-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error updating the cluster architecture #587
Comments
Hi @ajzach, did the errors disappear eventually? |
I also experienced a similar issue. When i scale down nodes in cluster, client doesn't update correctly. currently i digging it now. |
Hi @ajzach, I have just tested AWS Elasticache by deleting shards manually, but I didn't see the DNS error you described and rueidis refreshed the cluster topology successfully by issuing the Additionally, I found the AWS Elasticache responds to the
So this looks weird to me. Did you manually put the domain names of nodes into the |
To connect to Redis, I just use the endpoint. I performed the same tests adding and removing nodes, including failover, and everything worked correctly. This error is not very common; currently, we have more than 100 applications using Rueidis, and only 2 have reported this problem. It seems that the error occurs with more prolonged use of the client. Some condition causes the client to stop updating the cluster architecture internally and to retain in memory nodes that were deleted. |
Hi @ajzach, Rueidis sends The current rueidis always gets the latest cluster topology from the configuration endpoint only, but it seems not to work well with your Elasticache cluster. Would you like to try the new v1.0.42-alpha? The new version will send |
It is hard to reproduce situation. I agree with rueian. i guess that configuration endpoint returns stale cluster information. because i scale in/out, scale up/down clusters sometiems, but this is my first time not to update cluster topology. |
Thanks @proost! Would you also like to try the v1.0.42-alpha? And how old is your Elasticache cluster? I found that a newly created Elasticache forms the cluster with IP addresses instead of domain names. It is even impossible for me to get a domain name resolution error. |
do you mean version of redis or operating time? |
Maybe both. I think it is possible that clusters have differences even on a same redis version but created on different date. |
@rueian I use redis 7.0.7 version |
At the moment I was experiencing errors, I accessed the application instance and queried the nodes directly at the endpoint; the nodes that had been removed were not listed. |
Hi @ajzach, that was probably because at that time the configuration endpoint resolved to a relatively new node while rueidis kept an old connection to an old node. |
Hi @ajzach, the v1.0.42-alpha should reduce the chance of getting stale information from an old connection. Please let me know if you have tried it. |
I remember that the specialization in the AWS configuration was done for a reason. What are the implications of removing it? |
The configuration endpoint is essentially a DNS alias to all nodes according to an AWS Redis team member. The specialization was made before I knew that fact and with a wrong assumption that the endpoint was a special program that was responsible for cluster topology. So the specialization is actually meaningless. |
Hi @rueian,
in failed case, it was strange behavior based on what I've captured:
I think Does it relate to this patch https://github.com/redis/rueidis/releases/tag/v1.0.42-alpha? |
Hi @dangngoctam00, That looks very weird. Have you ever seen the cluster shards command been sent after initialization? |
Hi @rueian , I've not checked it in failed case, maybe I need to test it again, but based on tcpdump capture, in 3 minutes until failover, there is no |
@dangngoctam00 could you update rueidis version to equal or above than 1.0.42? In 1.0.42, Including Separtely from version up, Not sending |
@dangngoctam00 could you try again after bump up rueidis version? |
Hi @proost , I will upgrade version to |
Hi @rueian , @proost , Line 220 in b514a56
When there are multiple connections and there is only 1 aws endpoint, function getClusterSlots will be executed only one time and there is no message in channel results .It leads to this line Line 58 in b514a56
LazyDo will be ignored.Could you review it? Thank you. |
Hi @dangngoctam00, Thank you for looking into the details. If that was the case then versions after v1.0.42 should have fixed it. I also dropped the v1.0.45-alpha.2 which adds timeout on the |
Hi @rueian , I've tried with v1.0.45-alpha.2 and client refreshes redis cluster topology normally. |
Hello, I am having an issue with one of our clusters (AWS Elasticache) that has autoscaling configured. The cluster has a minimum of 10 and a maximum of 15. During the day, the cluster scaled up to 13 nodes, and the client correctly detected the new nodes. However, it then scaled back down to 10, and we started seeing errors where the client was trying to resolve the domain of the nodes that had been removed, causing errors. I tested adding the 11th node back, and the client detected it, but once I removed it again, the errors reappeared. It seems like the client is not updating the cluster architecture internally. We are using version 1.0.35.
The text was updated successfully, but these errors were encountered: