Consider channels "live" even if they are awaiting a monitor update #954

TheBlueMatt · 2021-06-17T02:18:56Z

We use Channel::is_live() to gate inclusion of a channel in
ChannelManager::list_usable_channels() and when sending an
HTLC to select whether a channel is available for
forwarding through/sending to.

In both of these cases, we almost certainly want
Channel::is_live() to include channels which are simply pending a
monitor update, as some clients may update monitors asynchronously,
thus any rejection of HTLCs based on a monitor update still pending
causing a race condition.

After #851, we always ensure any holding cells are free'd when
sending P2P messages, making this much more trivially correct -
instead of having to ensure that we always have a matching holding
cell free any time we add something to the holding cell, we can
simply rely on the fact that it always happens.

Fixes #661.

This should make #949's tests a bit easier to implement.

codecov · 2021-06-17T02:36:14Z

Codecov Report

Merging #954 (0534e8b) into main (2940099) will increase coverage by 1.47%.
The diff coverage is 91.30%.

❗ Current head 0534e8b differs from pull request most recent head b58c884. Consider uploading reports for the commit b58c884 to get more accurate results

@@            Coverage Diff             @@
##             main     #954      +/-   ##
==========================================
+ Coverage   90.59%   92.06%   +1.47%     
==========================================
  Files          60       60              
  Lines       30423    39444    +9021     
==========================================
+ Hits        27561    36314    +8753     
- Misses       2862     3130     +268

Impacted Files	Coverage Δ
lightning/src/ln/channelmanager.rs	`87.18% <ø> (+3.27%)`	⬆️
lightning/src/ln/channel.rs	`92.86% <87.50%> (+4.57%)`	⬆️
lightning/src/ln/chanmon_update_fail_tests.rs	`97.60% <91.58%> (-0.19%)`	⬇️
lightning/src/ln/wire.rs	`60.52% <0.00%> (-3.58%)`	⬇️
lightning/src/util/macro_logger.rs	`87.87% <0.00%> (-1.19%)`	⬇️
lightning/src/ln/functional_test_utils.rs	`94.79% <0.00%> (-0.03%)`	⬇️
lightning/src/ln/reorg_tests.rs	`100.00% <0.00%> (+0.37%)`	⬆️
lightning/src/chain/chainmonitor.rs	`96.21% <0.00%> (+0.41%)`	⬆️
lightning/src/util/ser.rs	`91.70% <0.00%> (+0.43%)`	⬆️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2940099...b58c884. Read the comment docs.

TheBlueMatt · 2021-06-24T19:05:22Z

Addressed a few comments left on other PRs, including #949 (comment)

ariard

Just need to fix send_htlc doc I believe, otherwise good to me

lightning/src/ln/channel.rs

ariard · 2021-06-24T23:40:20Z

lightning/src/ln/channel.rs

@@ -3996,7 +3996,7 @@ impl<Signer: Sign> Channel<Signer> {
 			return Err(ChannelError::Ignore(format!("Cannot send less than their minimum HTLC value ({})", self.counterparty_htlc_minimum_msat)));
 		}

-		if (self.channel_state & (ChannelState::PeerDisconnected as u32 | ChannelState::MonitorUpdateFailed as u32)) != 0 {
+		if (self.channel_state & (ChannelState::PeerDisconnected as u32)) != 0 {
 			// Note that this should never really happen, if we're !is_live() on receipt of an
 			// incoming HTLC for relay will result in us rejecting the HTLC and we won't allow
 			// the user to send directly into a !is_live() channel. However, if we


This comment sounds weird, I still think we should be able to receive a HTLC even if the forwarding channel is knock-out, though we don't currently in decode_update_add_htlc_onion. Maybe possible after #680.

Right, I suppose its reasonable to accept an HTLC for forwarding to a channel which is disconnected currently if you're a special type of node that can make the channel become connected soon (of course for a general purpose node we should definitely fail). There isn't a lot of harm in waiting to fail until the payment gets to the send_htlc call (instead of doing it immediately upon receipt), so I think I agree we should just drop it (but keep this return).

All that to say, I don't think anything needs to change in this PR, correct? The comment is still true today, though I agree we could change it in a followup and should change it after #680.

Yeah I just believe this forwarding policy makes really sense if the next hop is owned by a mobile client, with a clear agreement "I'm going to be offline 99% of my time, please cache my HTLC" ? Or at least it would be worthy as an experimentation.

Right, it's more a good-place-to-clean-once-we-have-680.

ariard · 2021-06-25T00:48:21Z

Code Review ACK 8671aca

TheBlueMatt · 2021-06-26T16:34:39Z

Tagging as 0.0.99 as #949 depends on this and because this would be critical for any async-monitor-update routing node.

TheBlueMatt · 2021-06-29T23:48:52Z

Squashed with no changes:

$ git diff-tree 8671aca5 eab577c0
$

jkczyz

Somethings aren't clear to me when reading the commit message / PR description:

In both of these cases, we almost certainly want
Channel::is_live() to include channels which are simply pending a
monitor update, as some clients may update monitors asynchronously,

Either drop "almost certainly" or explain when this is not the case.

thus any rejection of HTLCs based on a monitor update still pending
causing a race condition.

Reads strange. Start new sentence here and replace "causing" with "causes"?

After #851, we always ensure any holding cells are free'd when
sending P2P messages, making this much more trivially correct -

What is "this" referring to? "this change"?

instead of having to ensure that we always have a matching holding
cell free any time we add something to the holding cell, we can
simply rely on the fact that it always happens.

Unclear to me what is meant by a "matching holding cell" (matching what?) and what "it always happens" is referring to.

lightning/src/ln/channel.rs

lightning/src/ln/chanmon_update_fail_tests.rs

jkczyz · 2021-06-30T18:25:23Z

lightning/src/ln/chanmon_update_fail_tests.rs

@@ -922,40 +922,6 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
 	commitment_signed_dance!(nodes[1], nodes[0], send_event.commitment_msg, false, true);
 	check_added_monitors!(nodes[1], 0);

-	let mut events_2 = nodes[1].node.get_and_clear_pending_msg_events();


Is there a TL;DR to summarize the changes to this test to help in review?

Because it is no longer the case that receiving an HTLC while we're pending a monitor update restoration causes the HTLC to fail, code which relied on that behavior broke. eg the first code hunk which was removed explicitly delivered an HTLC and expected it to fail, which no longer happens.

The big if-block diff hunk is because the two cases have converged somewhat - we now always have a payment in node[1]'s outbound holding cell.

jkczyz · 2021-06-30T18:27:53Z

lightning/src/ln/chanmon_update_fail_tests.rs

@@ -1630,12 +1640,19 @@ fn test_monitor_update_fail_claim() {

 	*nodes[1].chain_monitor.update_ret.lock().unwrap() = Some(Err(ChannelMonitorUpdateErr::TemporaryFailure));
 	assert!(nodes[1].node.claim_funds(payment_preimage_1));
+	nodes[1].logger.assert_log("lightning::ln::channelmanager".to_string(), "Temporary failure claiming HTLC, treating as success: Failed to update ChannelMonitor".to_string(), 1);


Is there any way to assert the state of the channel rather than examining the log?

This line was only moved, not created. the log assert is nice IMO as it checks the specific error, not just that one happened, though the observable state is still checked in that no new message was generated.

TheBlueMatt · 2021-06-30T19:36:16Z

I rewrote the commit message in the main commit (without changing its contents), let me know what you think.

jkczyz · 2021-06-30T22:50:50Z

I rewrote the commit message in the main commit (without changing its contents), let me know what you think.

Looks good. Thanks for doing it!

We use `Channel::is_live()` to gate inclusion of a channel in `ChannelManager::list_usable_channels()` and when sending an HTLC to select whether a channel is available for forwarding through/sending to. In both of these cases, we should consider a channel `is_live()` when they are pending a monitor update. Some clients may update monitors asynchronously, thus we may simply be waiting a short duration for a monitor update to complete, and shouldn't fail all forwarding HTLCs during that time. After lightningdevkit#851, we always ensure any holding cells are free'd when sending P2P messages, making this change much more trivially correct - instead of having to ensure that we always free the holding cell when a channel becomes live again after adding something to the holding cell, we can simply rely on the fact that it always happens. Fixes lightningdevkit#661.

TheBlueMatt · 2021-06-30T23:16:25Z

Squashed with no changes:

$ git diff-tree -U1 0534e8b9 b58c8843
$

The full diff from the previous ack is still trivial, so will merge after CI:

$ git diff-tree -U1 -b eab577c0 b58c8843
diff --git a/lightning/src/ln/chanmon_update_fail_tests.rs b/lightning/src/ln/chanmon_update_fail_tests.rs
index ac293cc1..73fc5315 100644
--- a/lightning/src/ln/chanmon_update_fail_tests.rs
+++ b/lightning/src/ln/chanmon_update_fail_tests.rs
@@ -908,4 +908,4 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
 
-	// Attempt to forward a third payment but fail due to the second channel being unavailable
-	// for forwarding.
+	// Forward a third payment which will also be added to the holding cell, despite the channel
+	// being paused waiting a monitor update.
 	let (_, payment_hash_3, payment_secret_3) = get_payment_preimage_hash!(nodes[2]);
@@ -924,2 +924,8 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
 
+	// Call forward_pending_htlcs and check that the new HTLC was simply added to the holding cell
+	// and not forwarded.
+	expect_pending_htlcs_forwardable!(nodes[1]);
+	check_added_monitors!(nodes[1], 0);
+	assert!(nodes[1].node.get_and_clear_pending_events().is_empty());
+
 	let (payment_preimage_4, payment_hash_4) = if test_ignore_second_cs {
@@ -942,8 +948,2 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
 
-	// Call forward_pending_htlcs first to make sure we don't have any issues attempting (and
-	// failing) to forward an HTLC while a channel is still awaiting monitor update restoration.
-	expect_pending_htlcs_forwardable!(nodes[1]);
-	check_added_monitors!(nodes[1], 0);
-	assert!(nodes[1].node.get_and_clear_pending_events().is_empty());
-
 	// Restore monitor updating, ensuring we immediately get a fail-back update and a
@@ -1048,2 +1048,3 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
 	}
+
 	assert_eq!(as_cs.update_add_htlcs.len(), 1);
@@ -1077,3 +1078,2 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
 
-
 	expect_pending_htlcs_forwardable!(nodes[2]);
diff --git a/lightning/src/ln/channel.rs b/lightning/src/ln/channel.rs
index 19e28d4a..26725111 100644
--- a/lightning/src/ln/channel.rs
+++ b/lightning/src/ln/channel.rs
@@ -3976,10 +3976,14 @@ impl<Signer: Sign> Channel<Signer> {
 	/// send_htlc_and_commit instead cause you'll want both messages at once.
-	/// This returns an option instead of a pure UpdateAddHTLC as we may be in a state where we
-	/// cannot add HTLCs on the wire.
-	/// In cases where we're waiting on the remote peer to send us a revoke_and_ack, if we did, we
-	/// wouldn't be able to determine what they actually ACK'ed when we do receive the
-	/// revoke_and_ack.
-	/// In cases where we're marked MonitorUpdateFailed, we cannot commit to a new state as we may
-	/// not yet have sent the previous commitment update messages and will need to regenerate them.
-	/// You MUST call send_commitment prior to any other calls on this Channel
+	///
+	/// This returns an optional UpdateAddHTLC as we may be in a state where we cannot add HTLCs on
+	/// the wire:
+	/// * In cases where we're waiting on the remote peer to send us a revoke_and_ack, we
+	///   wouldn't be able to determine what they actually ACK'ed if we have two sets of updates
+	///   awaiting ACK.
+	/// * In cases where we're marked MonitorUpdateFailed, we cannot commit to a new state as we
+	///   may not yet have sent the previous commitment update messages and will need to regenerate
+	///   them.
+	///
+	/// You MUST call send_commitment prior to calling any other methods on this Channel!
+	///
 	/// If an Err is returned, it's a ChannelError::Ignore!
$

TheBlueMatt mentioned this pull request Jun 17, 2021

Send channel_update messages to direct peers on private channels #949

Merged

ariard reviewed Jun 24, 2021

View reviewed changes

TheBlueMatt added this to the 0.0.99 milestone Jun 26, 2021

TheBlueMatt force-pushed the 2021-06-no-spurious-forward-fails branch from 8671aca to eab577c Compare June 29, 2021 23:48

jkczyz reviewed Jun 30, 2021

View reviewed changes

TheBlueMatt force-pushed the 2021-06-no-spurious-forward-fails branch from eab577c to 0534e8b Compare June 30, 2021 19:35

jkczyz approved these changes Jun 30, 2021

View reviewed changes

TheBlueMatt force-pushed the 2021-06-no-spurious-forward-fails branch from 0534e8b to b58c884 Compare June 30, 2021 23:15

TheBlueMatt merged commit 4353d4a into lightningdevkit:main Jul 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider channels "live" even if they are awaiting a monitor update #954

Consider channels "live" even if they are awaiting a monitor update #954

TheBlueMatt commented Jun 17, 2021

codecov bot commented Jun 17, 2021 •

edited

Loading

TheBlueMatt commented Jun 24, 2021

ariard left a comment

ariard Jun 24, 2021

TheBlueMatt Jun 25, 2021

ariard Jun 25, 2021

ariard commented Jun 25, 2021

TheBlueMatt commented Jun 26, 2021

TheBlueMatt commented Jun 29, 2021

jkczyz left a comment

jkczyz Jun 30, 2021

TheBlueMatt Jun 30, 2021

jkczyz Jun 30, 2021

TheBlueMatt Jun 30, 2021

TheBlueMatt commented Jun 30, 2021

jkczyz commented Jun 30, 2021

TheBlueMatt commented Jun 30, 2021

Consider channels "live" even if they are awaiting a monitor update #954

Consider channels "live" even if they are awaiting a monitor update #954

Conversation

TheBlueMatt commented Jun 17, 2021

codecov bot commented Jun 17, 2021 • edited Loading

Codecov Report

TheBlueMatt commented Jun 24, 2021

ariard left a comment

Choose a reason for hiding this comment

ariard Jun 24, 2021

Choose a reason for hiding this comment

TheBlueMatt Jun 25, 2021

Choose a reason for hiding this comment

ariard Jun 25, 2021

Choose a reason for hiding this comment

ariard commented Jun 25, 2021

TheBlueMatt commented Jun 26, 2021

TheBlueMatt commented Jun 29, 2021

jkczyz left a comment

Choose a reason for hiding this comment

jkczyz Jun 30, 2021

Choose a reason for hiding this comment

TheBlueMatt Jun 30, 2021

Choose a reason for hiding this comment

jkczyz Jun 30, 2021

Choose a reason for hiding this comment

TheBlueMatt Jun 30, 2021

Choose a reason for hiding this comment

TheBlueMatt commented Jun 30, 2021

jkczyz commented Jun 30, 2021

TheBlueMatt commented Jun 30, 2021

codecov bot commented Jun 17, 2021 •

edited

Loading