Avoid disconnecting all peers if user code is slow #1269

TheBlueMatt · 2022-01-20T18:24:28Z

In the sample client (and likely other downstream users), event
processing may block on slow operations (e.g. Bitcoin Core RPCs)
and ChannelManager persistence may take some time. This should be
fine, except that we consider this a case of possible backgrounding
and disconnect all of our peers when it happens.

Instead, we here avoid considering event processing time in the
time between PeerManager events.

This is one commit extracted from #1023.

lightning-background-processor/src/lib.rs

jkczyz · 2022-01-20T18:37:32Z

lightning-background-processor/src/lib.rs

 				let updates_available =
 					channel_manager.await_persistable_update_timeout(Duration::from_millis(100));
 				if updates_available {
+					let persist_start = Instant::now();


Considering the 100ms timeout, may be simpler just to use one timer ending outside the if block.

I'm not sure I understand, are you saying just move this timer outside the if block?

Meant we could potentially combine ev_handle_start and persist_start timers into a single timer.

We don't know how long await_persistable_update_timeout takes, though, and the goal here is (mostly) to measure how long it took as an indirect way to figure out whether we went to background on, eg, iOS.

Hmm... isn't that what the timeout is for? Maybe I'm misunderstanding how it works.

I dropped all the addition stuff, it was actually incorrect cause of lack of rebase anyway, does the comment in the if block make sense?

codecov-commenter · 2022-01-20T18:45:47Z

Codecov Report

Merging #1269 (68de973) into main (d741fb1) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head 68de973 differs from pull request most recent head 0b769f2. Consider uploading reports for the commit 0b769f2 to get more accurate results

@@            Coverage Diff             @@
##             main    #1269      +/-   ##
==========================================
- Coverage   90.40%   90.39%   -0.02%     
==========================================
  Files          70       70              
  Lines       38118    38120       +2     
==========================================
- Hits        34462    34458       -4     
- Misses       3656     3662       +6

Impacted Files	Coverage Δ
lightning-background-processor/src/lib.rs	`93.04% <100.00%> (+0.06%)`	⬆️
lightning/src/ln/functional_tests.rs	`97.27% <0.00%> (-0.10%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d741fb1...0b769f2. Read the comment docs.

jkczyz · 2022-01-20T22:30:24Z

lightning-background-processor/src/lib.rs

+					// processing was slow at the top of the loop. For example, the sample client
+					// may call Bitcoin Core RPCs during event handling, which very often takes
+					// more than a handful of seconds to complete, and shouldn't disconnect all our
+					// peers.
 					log_trace!(logger, "Awoke after more than double our ping timer, disconnecting peers.");


Update reference to "double" in glorified comment. 😛 Likewise in the preceding comment.

Actually, I just swapped the comparison back to 2xPING_TIMER, which I think is more appropriate.

Actually, nevermind, this is a great opportunity to increase our ping timer while still being able to disconnect quickly if we get background'd. WIll fix.

jkczyz · 2022-01-20T22:31:00Z

lightning-background-processor/src/lib.rs

+					// Note that we have to take care to not get here just because user event
+					// processing was slow at the top of the loop. For example, the sample client
+					// may call Bitcoin Core RPCs during event handling, which very often takes
+					// more than a handful of seconds to complete, and shouldn't disconnect all our
+					// peers.


Is this comment relevant now that we don't time event processing?

Arguably yes, the point being that we time only the await, not the event processing.

lightning-background-processor/src/lib.rs

In the sample client (and likely other downstream users), event processing may block on slow operations (e.g. Bitcoin Core RPCs) and ChannelManager persistence may take some time. This should be fine, except that we consider this a case of possible backgrounding and disconnect all of our peers when it happens. Instead, we here avoid considering event processing time in the time between PeerManager events.

Because many lightning nodes can take quite some time to respond to pings, the five second ping timer can sometimes cause spurious disconnects even though a peer is online. However, in part as a response to mobile users where a connection may be lost as result of only a short time with the app in a "paused" state, we had a rather aggressive ping time to ensure we would disconnect quickly. However, since we now just used a fixed time for the "went to sleep" detection, we can somewhat increase the ping timer. We still want to be fairly aggressive to avoid sending HTLCs to a peer that is offline, but the tradeoff between spurious disconnections and stuck payments is likely doesn't need to be quite as aggressive.

TheBlueMatt · 2022-01-21T00:37:29Z

Squashed without diff from 0b769f2a7 to 2d3a21089.

TheBlueMatt added this to the 0.0.105 milestone Jan 20, 2022

jkczyz reviewed Jan 20, 2022

View reviewed changes

jkczyz previously approved these changes Jan 21, 2022

View reviewed changes

TheBlueMatt added 2 commits January 21, 2022 00:36

TheBlueMatt dismissed jkczyz’s stale review via 2d3a210 January 21, 2022 00:37

TheBlueMatt force-pushed the 022-01-no-disconnect-on-slow-persist branch from 0b769f2 to 2d3a210 Compare January 21, 2022 00:37

jkczyz approved these changes Jan 21, 2022

View reviewed changes

valentinewallace approved these changes Jan 21, 2022

View reviewed changes

valentinewallace merged commit b1bdba5 into lightningdevkit:main Jan 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid disconnecting all peers if user code is slow #1269

Avoid disconnecting all peers if user code is slow #1269

TheBlueMatt commented Jan 20, 2022

jkczyz Jan 20, 2022

TheBlueMatt Jan 20, 2022

jkczyz Jan 20, 2022

TheBlueMatt Jan 20, 2022

jkczyz Jan 20, 2022

TheBlueMatt Jan 20, 2022

codecov-commenter commented Jan 20, 2022 •

edited

Loading

jkczyz Jan 20, 2022

TheBlueMatt Jan 20, 2022

TheBlueMatt Jan 20, 2022

jkczyz Jan 20, 2022

TheBlueMatt Jan 20, 2022

TheBlueMatt commented Jan 21, 2022

Avoid disconnecting all peers if user code is slow #1269

Avoid disconnecting all peers if user code is slow #1269

Conversation

TheBlueMatt commented Jan 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 20, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt commented Jan 21, 2022

codecov-commenter commented Jan 20, 2022 •

edited

Loading