Make cross-sentinel connection failures non-fatal #177
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Follow-up on #135 , #167 and #168
This sequence of updates now causes the connection between sentinels, used for requesting attestations from other sentinels, to silently fail. Even though the
false
return value fromtcp_client::init
is now considered a warning and the sentinel continues running, the connection is not working because the handler thread never gets started. So now, RPC calls from sentinel to sentinel fail at runtime.The first sentinel in the startup sequence could potentially have no working connections and get stuck in endless retry loops.
Sentinel to sentinel communication is actually a situation where
cluster_connect(endpoints, false)
should be used even with a single endpoint, since it's fine for one or more sentinels to be unreachable temporarily (we'll just use another). But becausem_server_endpoints.size() <= 1
is used as parameter to cluster_connect that behavior is impossible.The reason sentinel->sentinel communication uses clusters of 1 is because the sentinel has to control which other sentinels are called to prevent getting more than one attestation from the same sentinel. So we can't just build a cluster of all other sentinels and use
send_to_one()
My suggestion, through this PR, is to add an optional boolean overload to
init()
that allows you to set theerror_fatal
parameter ofcluster_connect
manually.