Make cross-sentinel connection failures non-fatal #177

wadagso-gertjaap · 2022-09-13T15:23:55Z

Follow-up on #135 , #167 and #168

This sequence of updates now causes the connection between sentinels, used for requesting attestations from other sentinels, to silently fail. Even though the false return value from tcp_client::init is now considered a warning and the sentinel continues running, the connection is not working because the handler thread never gets started. So now, RPC calls from sentinel to sentinel fail at runtime.

The first sentinel in the startup sequence could potentially have no working connections and get stuck in endless retry loops.

Sentinel to sentinel communication is actually a situation where cluster_connect(endpoints, false) should be used even with a single endpoint, since it's fine for one or more sentinels to be unreachable temporarily (we'll just use another). But because m_server_endpoints.size() <= 1 is used as parameter to cluster_connect that behavior is impossible.

The reason sentinel->sentinel communication uses clusters of 1 is because the sentinel has to control which other sentinels are called to prevent getting more than one attestation from the same sentinel. So we can't just build a cluster of all other sentinels and use send_to_one()

My suggestion, through this PR, is to add an optional boolean overload to init() that allows you to set the error_fatal parameter of cluster_connect manually.

pr4u4t · 2022-09-13T18:35:35Z

I've also noticed that +1

metalicjames

Concept ACK. Just a question about the implementation.

src/uhs/sentinel/client.hpp

Signed-off-by: Gert-Jaap Glasbergen <[email protected]>

HalosGhost

Following your clarification (and the inclusion of that clarification in the documentation—thank you for that!), this looks good to me. Extends tcp_client::init to take an optional boolean allowing for a little more fine-grained control over when connection errors should be considered fatal.

@metalicjames I'd love your sign-off before merging in case I missed anything.

metalicjames

LGTM!

wadagso-gertjaap requested review from metalicjames and HalosGhost September 13, 2022 15:24

metalicjames reviewed Sep 13, 2022

View reviewed changes

src/uhs/sentinel/client.hpp Outdated Show resolved Hide resolved

wadagso-gertjaap force-pushed the bugfix-sentinel-attest branch from a3cda7f to 325cfec Compare September 14, 2022 05:24

wadagso-gertjaap requested a review from metalicjames September 14, 2022 11:07

wadagso-gertjaap force-pushed the bugfix-sentinel-attest branch 2 times, most recently from 9f2a30f to 58d002d Compare September 14, 2022 11:13

Make cross-sentinel connection failures non-fatal

5274516

Signed-off-by: Gert-Jaap Glasbergen <[email protected]>

wadagso-gertjaap force-pushed the bugfix-sentinel-attest branch from 58d002d to 5274516 Compare September 14, 2022 11:34

HalosGhost approved these changes Sep 14, 2022

View reviewed changes

metalicjames approved these changes Sep 15, 2022

View reviewed changes

HalosGhost merged commit 95c25e2 into mit-dci:trunk Sep 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make cross-sentinel connection failures non-fatal #177

Make cross-sentinel connection failures non-fatal #177

wadagso-gertjaap commented Sep 13, 2022

pr4u4t commented Sep 13, 2022

metalicjames left a comment

HalosGhost left a comment

metalicjames left a comment

Make cross-sentinel connection failures non-fatal #177

Make cross-sentinel connection failures non-fatal #177

Conversation

wadagso-gertjaap commented Sep 13, 2022

pr4u4t commented Sep 13, 2022

metalicjames left a comment

Choose a reason for hiding this comment

HalosGhost left a comment

Choose a reason for hiding this comment

metalicjames left a comment

Choose a reason for hiding this comment