Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During peer removal, try to remap any stream or consumer assets. #2493

Merged
merged 1 commit into from
Sep 7, 2021

Conversation

derekcollison
Copy link
Member

Also if we do not have room trap add peer and process there.
Also fixed a bug that would treat ephemerals the same as durables during remapping after peer removal.

Signed-off-by: Derek Collison [email protected]

/cc @nats-io/core

Also if we do not have room trap add peer and process there.
Fixed a bug that would treat ephemerals same as durables during remapping after peer removal.

Signed-off-by: Derek Collison <[email protected]>
csa := sa.copyGroup()
csa.Group.Peers = append(csa.Group.Peers, peer)
// Send our proposal for this csa. Also use same group definition for all the consumers as well.
cc.meta.Propose(encodeAddStreamAssignment(csa))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know the flows of this code much - but this Propose can block, and with enough streams this can take a very long time especially if the propose channel fills, is that ok? We could even lose leadership during this?

Mainly just curious how the code work, sure its fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes all of those things can happen. The way we mitigate is to only stay in place for a client, and run outside of main route loops and GWs.

This code can block, but is in its own Go routine already processing the raft layer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to point out that we have the js lock at the top of the function, so by blocking I figure you mean waiting for some other nodes to do something. If it is waiting on something in the same process, then we may have an issue since we are holding the js lock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only thing we wait on would be underlying raft layer to drain propc chan.

Copy link
Contributor

@ripienaar ripienaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit 5396fbe into main Sep 7, 2021
@derekcollison derekcollison deleted the remove-peer branch September 7, 2021 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants