-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ha-mode-exactly should ensure we have exactly N mirrors when down nodes come back up #122
Comments
So what is left to be done? Ensuring we don't have more than N mirrors at any given time? |
if it's done, then yeah, If I specified 2, it should be kept at 2. |
Thanks |
hi @michaelklishin. we've been running some tests to reproduce this issue. We're able to reproduce it when mirror nodes come back up simultaneously / in a near simultaneous manner. Frequency of occurance is very low and nondeterministic. We want to find out if we should still spend time fixing this in light of these test outcomes? |
@Ayanda-D yes, the probability of this is quite low. Given that we will be moving to Raft after 3.6.0, this is indeed a fair question. I'll discuss this with the team. |
Okay. Thanks. |
I think agreeing on the number of copies is per-se a group-membership/consensus problem best to be solved with something like Raft |
IMHO, the definition of done should be whether or not it always works. We are experiencing this even with 3.5. Your testing from most production setups may be different. Please test with after your rework to Raft but it should work 100% of the time, not 99%. |
@gsogol we completely agree that it should work correctly 100% of the time. Our question really is, should we try to solve this with an ad-hoc protocol (something we're moving away from) or after we introduce Raft into the core. |
I agree that it wouldn't make sense to do this now if you're planning big changes in 3.6. Waiting makes sense. Just wanted to clarify that low occurrences are still had.
|
@gsogol thanks for the feedback! We'll postpone this until after we move this part to Raft. This may be 3.6.0 or 3.7.0. |
This is a very old issue and yet there are still known issues when it comes to classic queue mirroring and maintaining exact number of replicas, eg. #2737. Closing the issue, as quorum queues should be used instead. |
Per request of @tsaleh:
With an ha policy, I can specify to keep 2 copies. However, as nodes go down or new one come up, the policy does not know whether or not you have 1 now and you need to automatically create a copy on a new node. However, if a 2nd node comes back up, you now have 3 copies so you need to rebalance back to 2. It seems @simonmacmullen filed #26463 to help with this request.
The text was updated successfully, but these errors were encountered: