Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ha-mode-exactly should ensure we have exactly N mirrors when down nodes come back up #122

Closed
gsogol opened this issue Apr 20, 2015 · 12 comments
Assignees

Comments

@gsogol
Copy link

gsogol commented Apr 20, 2015

Per request of @tsaleh:

With an ha policy, I can specify to keep 2 copies. However, as nodes go down or new one come up, the policy does not know whether or not you have 1 now and you need to automatically create a copy on a new node. However, if a 2nd node comes back up, you now have 3 copies so you need to rebalance back to 2. It seems @simonmacmullen filed #26463 to help with this request.

@michaelklishin
Copy link
Member

26463 is fixed in 3.5.0. The title says

Make ha-mode=exactly start up new mirrors when old ones go down

So what is left to be done? Ensuring we don't have more than N mirrors at any given time?

@gsogol
Copy link
Author

gsogol commented Apr 20, 2015

if it's done, then yeah, If I specified 2, it should be kept at 2.

@michaelklishin michaelklishin self-assigned this Apr 20, 2015
@michaelklishin michaelklishin changed the title HA Policy: Rebalancing copies of messages ha-mode-exactly should ensure we have exactly N mirrors when down nodes come back up Apr 20, 2015
@gsogol
Copy link
Author

gsogol commented Apr 20, 2015

Thanks

@Ayanda-D
Copy link
Contributor

hi @michaelklishin. we've been running some tests to reproduce this issue. We're able to reproduce it when mirror nodes come back up simultaneously / in a near simultaneous manner. Frequency of occurance is very low and nondeterministic. We want to find out if we should still spend time fixing this in light of these test outcomes?

@michaelklishin
Copy link
Member

@Ayanda-D yes, the probability of this is quite low. Given that we will be moving to Raft after 3.6.0, this is indeed a fair question. I'll discuss this with the team.

@Ayanda-D
Copy link
Contributor

Okay. Thanks.

@videlalvaro
Copy link
Contributor

I think agreeing on the number of copies is per-se a group-membership/consensus problem best to be solved with something like Raft

@gsogol
Copy link
Author

gsogol commented Aug 10, 2015

IMHO, the definition of done should be whether or not it always works. We are experiencing this even with 3.5. Your testing from most production setups may be different. Please test with after your rework to Raft but it should work 100% of the time, not 99%.

@michaelklishin
Copy link
Member

@gsogol we completely agree that it should work correctly 100% of the time. Our question really is, should we try to solve this with an ad-hoc protocol (something we're moving away from) or after we introduce Raft into the core.

@gsogol
Copy link
Author

gsogol commented Aug 10, 2015

I agree that it wouldn't make sense to do this now if you're planning big changes in 3.6. Waiting makes sense. Just wanted to clarify that low occurrences are still had.

On Aug 10, 2015, at 8:22 AM, Michael Klishin [email protected] wrote:

@gsogol we completely agree that it should work correctly 100% of the time. Our question really is, should we try to solve this with an ad-hoc protocol (something we're moving away from) or after we introduce Raft into the core.


Reply to this email directly or view it on GitHub.

@michaelklishin
Copy link
Member

@gsogol thanks for the feedback! We'll postpone this until after we move this part to Raft. This may be 3.6.0 or 3.7.0.

@mkuratczyk
Copy link
Contributor

This is a very old issue and yet there are still known issues when it comes to classic queue mirroring and maintaining exact number of replicas, eg. #2737. Closing the issue, as quorum queues should be used instead.

dcorbacho pushed a commit that referenced this issue Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants