Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider channels "live" even if they are awaiting a monitor update #954

Conversation

TheBlueMatt
Copy link
Collaborator

We use Channel::is_live() to gate inclusion of a channel in
ChannelManager::list_usable_channels() and when sending an
HTLC to select whether a channel is available for
forwarding through/sending to.

In both of these cases, we almost certainly want
Channel::is_live() to include channels which are simply pending a
monitor update, as some clients may update monitors asynchronously,
thus any rejection of HTLCs based on a monitor update still pending
causing a race condition.

After #851, we always ensure any holding cells are free'd when
sending P2P messages, making this much more trivially correct -
instead of having to ensure that we always have a matching holding
cell free any time we add something to the holding cell, we can
simply rely on the fact that it always happens.

Fixes #661.

This should make #949's tests a bit easier to implement.

@codecov
Copy link

codecov bot commented Jun 17, 2021

Codecov Report

Merging #954 (0534e8b) into main (2940099) will increase coverage by 1.47%.
The diff coverage is 91.30%.

❗ Current head 0534e8b differs from pull request most recent head b58c884. Consider uploading reports for the commit b58c884 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #954      +/-   ##
==========================================
+ Coverage   90.59%   92.06%   +1.47%     
==========================================
  Files          60       60              
  Lines       30423    39444    +9021     
==========================================
+ Hits        27561    36314    +8753     
- Misses       2862     3130     +268     
Impacted Files Coverage Δ
lightning/src/ln/channelmanager.rs 87.18% <ø> (+3.27%) ⬆️
lightning/src/ln/channel.rs 92.86% <87.50%> (+4.57%) ⬆️
lightning/src/ln/chanmon_update_fail_tests.rs 97.60% <91.58%> (-0.19%) ⬇️
lightning/src/ln/wire.rs 60.52% <0.00%> (-3.58%) ⬇️
lightning/src/util/macro_logger.rs 87.87% <0.00%> (-1.19%) ⬇️
lightning/src/ln/functional_test_utils.rs 94.79% <0.00%> (-0.03%) ⬇️
lightning/src/ln/reorg_tests.rs 100.00% <0.00%> (+0.37%) ⬆️
lightning/src/chain/chainmonitor.rs 96.21% <0.00%> (+0.41%) ⬆️
lightning/src/util/ser.rs 91.70% <0.00%> (+0.43%) ⬆️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2940099...b58c884. Read the comment docs.

@TheBlueMatt
Copy link
Collaborator Author

Addressed a few comments left on other PRs, including #949 (comment)

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need to fix send_htlc doc I believe, otherwise good to me

@@ -3996,7 +3996,7 @@ impl<Signer: Sign> Channel<Signer> {
return Err(ChannelError::Ignore(format!("Cannot send less than their minimum HTLC value ({})", self.counterparty_htlc_minimum_msat)));
}

if (self.channel_state & (ChannelState::PeerDisconnected as u32 | ChannelState::MonitorUpdateFailed as u32)) != 0 {
if (self.channel_state & (ChannelState::PeerDisconnected as u32)) != 0 {
// Note that this should never really happen, if we're !is_live() on receipt of an
// incoming HTLC for relay will result in us rejecting the HTLC and we won't allow
// the user to send directly into a !is_live() channel. However, if we
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment sounds weird, I still think we should be able to receive a HTLC even if the forwarding channel is knock-out, though we don't currently in decode_update_add_htlc_onion. Maybe possible after #680.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I suppose its reasonable to accept an HTLC for forwarding to a channel which is disconnected currently if you're a special type of node that can make the channel become connected soon (of course for a general purpose node we should definitely fail). There isn't a lot of harm in waiting to fail until the payment gets to the send_htlc call (instead of doing it immediately upon receipt), so I think I agree we should just drop it (but keep this return).

All that to say, I don't think anything needs to change in this PR, correct? The comment is still true today, though I agree we could change it in a followup and should change it after #680.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I just believe this forwarding policy makes really sense if the next hop is owned by a mobile client, with a clear agreement "I'm going to be offline 99% of my time, please cache my HTLC" ? Or at least it would be worthy as an experimentation.

Right, it's more a good-place-to-clean-once-we-have-680.

@ariard
Copy link

ariard commented Jun 25, 2021

Code Review ACK 8671aca

@TheBlueMatt TheBlueMatt added this to the 0.0.99 milestone Jun 26, 2021
@TheBlueMatt
Copy link
Collaborator Author

Tagging as 0.0.99 as #949 depends on this and because this would be critical for any async-monitor-update routing node.

@TheBlueMatt TheBlueMatt force-pushed the 2021-06-no-spurious-forward-fails branch from 8671aca to eab577c Compare June 29, 2021 23:48
@TheBlueMatt
Copy link
Collaborator Author

Squashed with no changes:

$ git diff-tree 8671aca5 eab577c0
$

Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somethings aren't clear to me when reading the commit message / PR description:

In both of these cases, we almost certainly want
Channel::is_live() to include channels which are simply pending a
monitor update, as some clients may update monitors asynchronously,

Either drop "almost certainly" or explain when this is not the case.

thus any rejection of HTLCs based on a monitor update still pending
causing a race condition.

Reads strange. Start new sentence here and replace "causing" with "causes"?

After #851, we always ensure any holding cells are free'd when
sending P2P messages, making this much more trivially correct -

What is "this" referring to? "this change"?

instead of having to ensure that we always have a matching holding
cell free any time we add something to the holding cell, we can
simply rely on the fact that it always happens.

Unclear to me what is meant by a "matching holding cell" (matching what?) and what "it always happens" is referring to.

@@ -922,40 +922,6 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
commitment_signed_dance!(nodes[1], nodes[0], send_event.commitment_msg, false, true);
check_added_monitors!(nodes[1], 0);

let mut events_2 = nodes[1].node.get_and_clear_pending_msg_events();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a TL;DR to summarize the changes to this test to help in review?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is no longer the case that receiving an HTLC while we're pending a monitor update restoration causes the HTLC to fail, code which relied on that behavior broke. eg the first code hunk which was removed explicitly delivered an HTLC and expected it to fail, which no longer happens.

The big if-block diff hunk is because the two cases have converged somewhat - we now always have a payment in node[1]'s outbound holding cell.

@@ -1630,12 +1640,19 @@ fn test_monitor_update_fail_claim() {

*nodes[1].chain_monitor.update_ret.lock().unwrap() = Some(Err(ChannelMonitorUpdateErr::TemporaryFailure));
assert!(nodes[1].node.claim_funds(payment_preimage_1));
nodes[1].logger.assert_log("lightning::ln::channelmanager".to_string(), "Temporary failure claiming HTLC, treating as success: Failed to update ChannelMonitor".to_string(), 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to assert the state of the channel rather than examining the log?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was only moved, not created. the log assert is nice IMO as it checks the specific error, not just that one happened, though the observable state is still checked in that no new message was generated.

@TheBlueMatt TheBlueMatt force-pushed the 2021-06-no-spurious-forward-fails branch from eab577c to 0534e8b Compare June 30, 2021 19:35
@TheBlueMatt
Copy link
Collaborator Author

I rewrote the commit message in the main commit (without changing its contents), let me know what you think.

@jkczyz
Copy link
Contributor

jkczyz commented Jun 30, 2021

I rewrote the commit message in the main commit (without changing its contents), let me know what you think.

Looks good. Thanks for doing it!

We use `Channel::is_live()` to gate inclusion of a channel in
`ChannelManager::list_usable_channels()` and when sending an
HTLC to select whether a channel is available for
forwarding through/sending to.

In both of these cases, we should consider a channel `is_live()` when
they are pending a monitor update. Some clients may update monitors
asynchronously, thus we may simply be waiting a short duration for a
monitor update to complete, and shouldn't fail all forwarding HTLCs
during that time.

After lightningdevkit#851, we always ensure any holding cells are free'd when
sending P2P messages, making this change much more trivially
correct - instead of having to ensure that we always free the holding
cell when a channel becomes live again after adding something to the
holding cell, we can simply rely on the fact that it always happens.

Fixes lightningdevkit#661.
@TheBlueMatt TheBlueMatt force-pushed the 2021-06-no-spurious-forward-fails branch from 0534e8b to b58c884 Compare June 30, 2021 23:15
@TheBlueMatt
Copy link
Collaborator Author

Squashed with no changes:

$ git diff-tree -U1 0534e8b9 b58c8843
$

The full diff from the previous ack is still trivial, so will merge after CI:

$ git diff-tree -U1 -b eab577c0 b58c8843
diff --git a/lightning/src/ln/chanmon_update_fail_tests.rs b/lightning/src/ln/chanmon_update_fail_tests.rs
index ac293cc1..73fc5315 100644
--- a/lightning/src/ln/chanmon_update_fail_tests.rs
+++ b/lightning/src/ln/chanmon_update_fail_tests.rs
@@ -908,4 +908,4 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
 
-	// Attempt to forward a third payment but fail due to the second channel being unavailable
-	// for forwarding.
+	// Forward a third payment which will also be added to the holding cell, despite the channel
+	// being paused waiting a monitor update.
 	let (_, payment_hash_3, payment_secret_3) = get_payment_preimage_hash!(nodes[2]);
@@ -924,2 +924,8 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
 
+	// Call forward_pending_htlcs and check that the new HTLC was simply added to the holding cell
+	// and not forwarded.
+	expect_pending_htlcs_forwardable!(nodes[1]);
+	check_added_monitors!(nodes[1], 0);
+	assert!(nodes[1].node.get_and_clear_pending_events().is_empty());
+
 	let (payment_preimage_4, payment_hash_4) = if test_ignore_second_cs {
@@ -942,8 +948,2 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
 
-	// Call forward_pending_htlcs first to make sure we don't have any issues attempting (and
-	// failing) to forward an HTLC while a channel is still awaiting monitor update restoration.
-	expect_pending_htlcs_forwardable!(nodes[1]);
-	check_added_monitors!(nodes[1], 0);
-	assert!(nodes[1].node.get_and_clear_pending_events().is_empty());
-
 	// Restore monitor updating, ensuring we immediately get a fail-back update and a
@@ -1048,2 +1048,3 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
 	}
+
 	assert_eq!(as_cs.update_add_htlcs.len(), 1);
@@ -1077,3 +1078,2 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
 
-
 	expect_pending_htlcs_forwardable!(nodes[2]);
diff --git a/lightning/src/ln/channel.rs b/lightning/src/ln/channel.rs
index 19e28d4a..26725111 100644
--- a/lightning/src/ln/channel.rs
+++ b/lightning/src/ln/channel.rs
@@ -3976,10 +3976,14 @@ impl<Signer: Sign> Channel<Signer> {
 	/// send_htlc_and_commit instead cause you'll want both messages at once.
-	/// This returns an option instead of a pure UpdateAddHTLC as we may be in a state where we
-	/// cannot add HTLCs on the wire.
-	/// In cases where we're waiting on the remote peer to send us a revoke_and_ack, if we did, we
-	/// wouldn't be able to determine what they actually ACK'ed when we do receive the
-	/// revoke_and_ack.
-	/// In cases where we're marked MonitorUpdateFailed, we cannot commit to a new state as we may
-	/// not yet have sent the previous commitment update messages and will need to regenerate them.
-	/// You MUST call send_commitment prior to any other calls on this Channel
+	///
+	/// This returns an optional UpdateAddHTLC as we may be in a state where we cannot add HTLCs on
+	/// the wire:
+	/// * In cases where we're waiting on the remote peer to send us a revoke_and_ack, we
+	///   wouldn't be able to determine what they actually ACK'ed if we have two sets of updates
+	///   awaiting ACK.
+	/// * In cases where we're marked MonitorUpdateFailed, we cannot commit to a new state as we
+	///   may not yet have sent the previous commitment update messages and will need to regenerate
+	///   them.
+	///
+	/// You MUST call send_commitment prior to calling any other methods on this Channel!
+	///
 	/// If an Err is returned, it's a ChannelError::Ignore!
$

@TheBlueMatt TheBlueMatt merged commit 4353d4a into lightningdevkit:main Jul 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spurious Forwarding Failures in Async Monitor Update Clients
3 participants