Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] do not retry the txhashset msg send (do not retry any msg with an attachment) #3152

Closed

Conversation

antiochp
Copy link
Member

@antiochp antiochp commented Nov 29, 2019

Related #3145.

The existing retry_send logic retries any msg send if try_break! returns None.
try_break! does some funky stuff internally squashing some exceptions but letting others propagate. WouldBlock and TimedOut are both squashed, returning None.
So any msg send that times out will be retried.

This is fine for most (small) self-contained msgs but is definitely not fine for msgs that contain an attachment. For attachments we write repeatedly in a tight loop and any iteration of this loop could experience a timeout.
In situations like this we want to fail fast and not attempt to retry the msg send.

The receiver may have received a partial txhashset zip file and the only thing we can do here is to close the connection to signal to the recipient that we cannot reliably send the remaining bytes of the zip file.

Do not merge yet - needs discussing.
#3145 should be merged first.


We may want to rethink how we implement write_message to make sending the attachment more robust. But even if we do we still want to prevent msgs with attachments from being retried at the full msg level.

@antiochp antiochp requested a review from hashmap November 29, 2019 16:04
@hashmap
Copy link
Contributor

hashmap commented Nov 29, 2019

Sounds reasonable

@antiochp
Copy link
Member Author

Going to take another pass at this.

try_break! actively works against us when writing messages onto the tcp stream, obfuscating errors and making it hard to really know what can and what cannot be safely retried.

And the per-msg "retry" logic feels flawed to me - we end up potentially retrying a msg forever under some error conditions (whereas other error conditions mean we drop the connection).

I think we can write this in a far simpler way by getting rid of try_break! and handling the "retry" internally in write_message, either successfully writing all the required bytes or failing and closing the connection.

There is no intermediate state that makes sense where "we could not write all the bytes but we'll just try sending the whole msg again and keep doing this until it gets more serious" which is effectively what we are attempting to handle currently.

@antiochp
Copy link
Member Author

antiochp commented Dec 2, 2019

Closing this for now.
Related - #3154

Now that we have consistency across all the peer connections (blocking everywhere consistently) the "send" behavior needs investigating again to see if "retry send" is even necessary or makes sense.

I'd like to try and reduce our reliance on try_break! but will approach this in another PR.

@antiochp antiochp closed this Dec 2, 2019
@antiochp antiochp deleted the no_retry_txhashset_msg branch December 2, 2019 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants