Add Auto-close on inbound/outbound update_fee delay #1056

ariard · 2021-08-22T17:44:00Z

Auto-close timer, if the channel is outbound, we sent a update_fee, and we didn't
receive a RAA from counterparty committing this state after AUTOCLOSE_TIMEOUT periods,
this channel must be force-closed.
If the channel is inbound, it has been observed the channel feerate should increase,
and we didn't receive a RAA from counterparty committing an update_fee after
AUTOCLOSE_TIMEOUT periods, this channel must be force-closed.

Close #993

test coverage
distrust inbound update_fee, verify it's going up
if feerates are going downwards, do not trigger timers
verify other implems timers to fine-tune AUTOCLOSE_TIMEOUT value

codecov · 2021-08-22T17:56:49Z

Codecov Report

Merging #1056 (285b0ec) into main (1a74367) will decrease coverage by 0.02%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##             main    #1056      +/-   ##
==========================================
- Coverage   90.47%   90.44%   -0.03%     
==========================================
  Files          69       69              
  Lines       37137    37190      +53     
==========================================
+ Hits        33600    33637      +37     
- Misses       3537     3553      +16

Impacted Files	Coverage Δ
lightning/src/util/config.rs	`45.83% <50.00%> (+0.18%)`	⬆️
lightning/src/ln/channelmanager.rs	`83.66% <68.18%> (-0.11%)`	⬇️
lightning/src/ln/channel.rs	`88.95% <81.25%> (-0.09%)`	⬇️
lightning/src/ln/functional_tests.rs	`97.38% <0.00%> (-0.04%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1a74367...285b0ec. Read the comment docs.

TheBlueMatt · 2021-08-23T19:17:13Z

Meeting notes - @ariard is going to make this a bit more conservative (eg require the feerate needs to increase 25-50% and be higher than our background estimate before force-closing) as well as check what other implementations are doing.

valentinewallace

Looks pretty reasonable! Will go more in-depth after the "require 25-50% feerate increase to force-close" updates

lightning/src/ln/channelmanager.rs

valentinewallace · 2021-09-08T16:18:54Z

lightning/src/ln/channel.rs

@@ -392,6 +398,10 @@ pub(super) struct Channel<Signer: Sign> {

 	latest_monitor_update_id: u64,

+	// Auto-close timer, if the channel is outbound, and we didn't receive a RAA for counterparty
+	// commitment transaction after `AUTOCLOSE_TIMEOUT` periods, this channel must be force-closed.
+	autoclose_timer: u16,


nit: seems like it could be a u8

I think a u8 is only 4h of blocks (assuming 1min tick), if we make it configurable in the future and node operators wanna increase them we might have to update back to u16?

valentinewallace · 2021-09-08T16:27:40Z

lightning/src/ln/channelmanager.rs

@@ -2660,6 +2667,23 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
 		(retain_channel, NotifyOption::DoPersist, ret_err)
 	}

+	fn check_channel_autoclose(&self, short_to_id: &mut HashMap<u64, [u8; 32]>, chan_id: &[u8; 32], chan: &mut Channel<Signer>) -> (bool, NotifyOption, Result<(), MsgHandleErrInternal>) {


High level doc comment plz

Added one, let me know if good to you :) Channel::check_channel_autoclose is more commented.

valentinewallace · 2021-09-08T16:44:04Z

lightning/src/ln/channelmanager.rs

 		// If the feerate has decreased by less than half, don't bother
 		if new_feerate <= chan.get_feerate() && new_feerate * 2 > chan.get_feerate() {
 			log_trace!(self.logger, "Channel {} does not qualify for a feerate change from {} to {}.",
 				log_bytes!(chan_id[..]), chan.get_feerate(), new_feerate);
 			return (true, NotifyOption::SkipPersist, Ok(()));
 		}
+		if !chan.is_outbound() {
+			if chan.maybe_trigger_autoclose_timer() {


I don't quite get why we start the autoclose timer here? Comment might help

To ensure timely confirmations of its transactions, a Lightning node should always ensure those transactions commit to a compelling feerate. In case of mempool spikes or decongestion, the update_fee mechanism allows the nodes to cooperatively adjust the channel's transactions feerate. If the update_fee sequence doesn't succeed, a node should preemptively close their channels instead of bearing the risk of non-confirming commitment transactions with pending time-sensitive HTLC outputs.

In LDK, we run timer_tick_occured as a background task, independently of the channel inbound/outboundness. One of the background task called is update_channel_fee, where we already monitor if it's relevant to send a outbound update_fee. As such I think it's a good location to add the trigger for inbound, removing the return in beginning of update_channel_fee.

That said, I think it's okay to merge the inbound/outbound code paths ? Like removing the maybe_trigger_autoclose_timer from send_update_fee.

edit: ah no because send_update_fee_and_commit is a public method and we could re-use it from a different path rather than update_channel_fee.

valentinewallace · 2021-09-08T16:48:37Z

lightning/src/ln/channel.rs

+		}
+		if self.autoclose_timer == AUTOCLOSE_TIMEOUT {
+			// If the channel doesn't have pending HTLC outputs to claim on-chain
+			if self.pending_inbound_htlcs.len() + self.pending_outbound_htlcs.len() > 0 {


Should we be resetting the timer in the else case?

Well not as simple. The HTLCs might have cleanup since timer starting but the feerate is still stalling. If a HTLC is added just after we reset the timer and the channel force-closes, confirmation might be delayed for a while, or even the transaction not propagating across the network mempools. The channel won't be force-close before the expiration of another AUTOCLOSE_TIMEOUT.

If we assume that channels are more often loaded with HTLCs than empty, I think we should not reset the timer even if we don't have pending HTLCs currently now. Though maybe we should remove this force-close waiver or move to an economical one ("if HTLC exposure is under the feerate cost to force-lose, don't break the channel") ?

TheBlueMatt · 2021-09-15T18:46:02Z

Moving to 0.1 as I'm not sure we should delay 0.0.101 or 0.0.102 for this.

TheBlueMatt · 2021-11-24T16:39:17Z

Needs rebase.

ariard · 2021-11-25T03:01:36Z

Rebased at f64c46c.

Few changes since last time :

a new should_autoclose config value has been added to ChannelConfig (default: true), as the channel might have been opened with a trusted, always-online peer, thus making on-chain unilateral close irrelevant
AUTOCLOSE_TIMEOUT has been scaled to 60 min (max lnd value), high enough to avoid closures with floppy peers but low enough to still have odds of block inclusion
force-close are conditional on included htlcs being present, don't go on-chain for dust HTLCs
start a auto-close timer if we fail outbound update fee affordance (addressing Check for outbound feerate update affordability before sending #1054 (comment))

Especially happy to get opinions on a) AUTOCLOSE_TIMEOUT and b) sanitizing the inbound update fee being effectively upward in a given range (e.g 30%) to avoid malicious peers doing the strict minimum.

Still have to add test coverages.

lightning/src/util/config.rs

TheBlueMatt · 2021-11-26T19:19:29Z

lightning/src/util/config.rs

+	/// If this is set to false, we do not auto-close channel with pending time-sensitive outputs,
+	/// of which the feerate has been detected as stalling.
+	/// Default value: true.
+	pub should_autoclose: bool,


Maybe we should make this some kind of knob to configure how aggressively we do this instead? eg right now we require a feerate increase of 30%, maybe we should make that a knob and users who want to turn off autoclose can set it to u64::max_value()?

What feerate we would like the commitment transaction to be committed with ? Ideally we would have visibility on the current mempool feerate groups composition to predict a worst-case evolution of those ones based on historical data. Assuming this worst-case, we should guarantee our feerate stay in-sync with the top feerate group for block inclusion during our CLTV_CLAIM_BUFFER (once we unilateraly break the chain we can't update it anymore). The current "30%" should represent our rate of fidelity of the worst-case, if a node operator would like to diverge from the worst-case, they should tweak it accordingly.

Even further, to have a more complete model, we should also tweak AUTOCLOSE_TIMEOUT to cover the velocity of the worst-case scenario ("Once we detected evolution towards unsafe mempool feerate groups, how fast we can react to correct the included feerate ?"). The issue being there, our reaction time is dependent on our counterparty interactivity latency, and there we could demand input from the node operator ("what's the liveliness reliability of this peer ?") to adjust in consequence AUTOCLOSE_TIMEOUT. The current AUTOCLOSE_TIMEOUT should be block-long enough to let our fee-estimator and our counterparty converge, but also short enough to match worst-case congestion velocity.

Ultimately the limitating safety factor is our counterparty latency. And that one can be quite extreme if our counterparty isn't robust against low-odds, high-impact risks (data center fire, DNS server outage, etc). Assuming package relay support and moving to anchor output as a fee-bumping mechanism remove this safety dependency on counterparty latency.

Or if you have a more rule-of-thumb thinking on wat to advice node operators to configure this potential knob ?

I don't think there is a clear answer, sadly. Indeed, no fee estimator (that I'm aware of) is really tuned for the lightning threat model. That said, the answer is likely some function over the trust a user has for their counterparty, which leads me to believe it should be a knob. If you're running a node that opens channels automatically with any counterparty, you probably want something pretty tight (30% seems reasonable?), if you're a node that only makes outbound channel opens with trusted peers, maybe 500% is more reasonable.

TheBlueMatt · 2021-11-26T19:21:24Z

lightning/src/ln/channel.rs

@@ -2832,13 +2857,35 @@ impl<Signer: Sign> Channel<Signer> {
 		}
 	}

+	/// Trigger the autoclose timer if it's in the starting position
+	pub fn maybe_trigger_autoclose_timer(&mut self, current_feerate: u32, new_feerate: u32, background_feerate: u32) -> bool {


Why isn't this just a part of send_update_fee?

There is at least a call in update_channel_fee to monitor stalling inbound update_fee ?

Right, but this method is indirectly called in timer tick (via update_channel_fee) but check_channel_autoclose is called directly in timer tick. I still don't see a reason why they are separate, they're both called during timer ticks.

I've verified the paths. I see, you're thinking to just pass a boolean trigger_autoclose to send_update_fee down from update_channel_fee. IIRC, my motivation to have a separate method to start the auto-close timer was to make the mechanism re-usable on conditions beyond feerate variations such as peer disconnection, fee-bumping reserve exhaustion, etc.

lightning/src/ln/channel.rs

ariard · 2021-11-29T15:51:34Z

Updated at 285b0ec.

Note, removed the check against background_feerate in maybe_trigger_autoclose. Verifying the new feerate is above 30% of current channel feerate should be enough as fee upward detection check.

lightning/src/ln/channelmanager.rs

TheBlueMatt · 2021-11-30T02:08:20Z

One worry I have is what happens if we have an HTLC that times out in a day, but get no blocks for an hour and thus have a huge fee spike. One maybe simpler solution is to increase CLTV_CLAIM_BUFFER and LATENCY_GRACE_PERIOD_BLOCKS as used in should_broadcast_holder_commitment_tx by some ratio of how much feerate has increased compared to the latest holder commitment tx.

ariard · 2021-12-07T01:01:32Z

Updated at eaabe00 with inbound-with-pending-payment-restriction-only and new config autoclose_variation_rate.

One maybe simpler solution is to increase CLTV_CLAIM_BUFFER and LATENCY_GRACE_PERIOD_BLOCKS as used in should_broadcast_holder_commitment_tx by some ratio of how much feerate has increased compared to the latest holder commitment tx.

I agree we could make CLTV_CLAIM_BUFFER and LATENCY_GRACE_PERIOD_BLOCKS dependent on fee spikes, extending them to provoke earlier closure of the channel and as such increasing the odds of transaction confirmation. Or we could also increase them in case of fee-bumping reserve exhaustion in the anchor output model. That said, I think there is another trade-off, if your fee-estimator gets compromised or unreliable, your channels are becoming far more frail, it might be the kind of mechanism where you don't want all the nodes behaving the same to avoid ecosystem-wide threshold effect.

TheBlueMatt · 2021-12-07T18:15:44Z

That said, I think there is another trade-off, if your fee-estimator gets compromised or unreliable, your channels are becoming far more frail, it might be the kind of mechanism where you don't want all the nodes behaving the same to avoid ecosystem-wide threshold effect.

Don't you have a similar trade-off with this PR as-is, though? In either case, if your fee estimator thinks the feerate needs to be higher, we'll force-close, even if the fee-estimator is garbage. At least moving this logic to the ChannelMonitor keeps our "should we force-close due to HTLC expiry" code in one place, and ensures a redundant ChannelMonitor can make the same decision.

I feel like doing it in that context also allows us a bit more specificity because we already have code around HTLC status there so its less total new code and we can bound stuff as we want.

ariard · 2021-12-13T19:31:19Z

From online meeting : "accelerate the force-close of inbound HTLCs in case of slow-block issuance and feerate spikes, compress the block-timer in ChannelMonitor's should_broadcast().

codecov-commenter · 2021-12-14T01:06:15Z

Codecov Report

Merging #1056 (589389d) into main (d29ae18) will increase coverage by 1.71%.
The diff coverage is 64.78%.

❗ Current head 589389d differs from pull request most recent head 5ae2cc7. Consider uploading reports for the commit 5ae2cc7 to get more accurate results

@@            Coverage Diff             @@
##             main    #1056      +/-   ##
==========================================
+ Coverage   90.51%   92.22%   +1.71%     
==========================================
  Files          71       69       -2     
  Lines       39012    47480    +8468     
==========================================
+ Hits        35310    43787    +8477     
+ Misses       3702     3693       -9

Impacted Files	Coverage Δ
lightning/src/chain/channelmonitor.rs	`92.16% <25.80%> (+0.59%)`	⬆️
lightning/src/ln/channel.rs	`93.09% <91.66%> (+3.80%)`	⬆️
lightning/src/ln/channelmanager.rs	`87.35% <100.00%> (+2.88%)`	⬆️
lightning/src/routing/scoring.rs	`92.80% <0.00%> (-2.23%)`	⬇️
lightning/src/util/logger.rs	`84.37% <0.00%> (-1.05%)`	⬇️
lightning/src/util/chacha20.rs	`95.37% <0.00%> (-1.00%)`	⬇️
lightning-invoice/src/utils.rs	`83.48% <0.00%> (-0.59%)`	⬇️
lightning/src/chain/package.rs	`92.26% <0.00%> (-0.35%)`	⬇️
lightning-invoice/src/de.rs	`80.94% <0.00%> (-0.34%)`	⬇️
lightning/src/ln/onion_utils.rs	`95.38% <0.00%> (-0.26%)`	⬇️
... and 33 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d29ae18...5ae2cc7. Read the comment docs.

TheBlueMatt · 2021-12-14T01:44:21Z

lightning/src/chain/channelmonitor.rs

@@ -1925,6 +1931,30 @@ impl<Signer: Sign> ChannelMonitorImpl<Signer> {
 						panic!("Attempted to replace shutdown script {} with {}", shutdown_script, scriptpubkey);
 					}
 				},
+				ChannelMonitorUpdateStep::FeesSpikes { feerate } => {


Why does this need a new Update type? Can't we just check the fee in ChannelMonitor when we call should_broadcast...?

Because you still need a tick-based timer against which to build the staling block detection ? Could be in ChannelMonitor, but we need to add a new pub timer_tick_occured there, I think.

Otherwise what do you have in mind to detect block staling, or maybe you're thinking to compress HTLC_CLAIM_BUFFER on fees variations only ?

Wait, I'm confused, I thought we were roughly on the same page that we don't need to do this on a tick but can wait for the next block? There's a lot of complexity here that all goes away if we just wait for the next block, and I'm really unconvinced you're ever going to (a) hit the "fees are spiking and our channel fee is now too low" code and (b) get into the next block - if you got into the next block the "our channel fee is now too low" criteria almost certainly shouldn't have been hit.

Wait, I'm confused, I thought we were roughly on the same page that we don't need to do this on a tick but can wait for the next block?

Well, I'm confused too, I've noted "slow block issuance" as a criteria from meeting. And in that sense, I've implemented the timer as a poorman heuristic to detect slow block issuance. What I understand, is you suggest to rely on the fee spikes detection, as reported by our fee-estimator, to decide to compess HTLC_CLAIM_BUFFER. If you assume to do it when we already hit should_broadcast, I think reduce the fault-tolerance worthiness of the feature, as actually you're waiting for the next block to get in, and potentially congestion to have worsened.

I'm good with that, though a) it should be knob-configurable to disable that mechanism if you trust your peers to be online and as thus reactive to your update_fee and b) a deprecation note to remove this mechanism once package relay is supported as it's kind of automatic-closure mechanism triggerable by a third-party owning your fee-estimator (sadly, a realistic assumption for our target phone users)

If you assume to do it when we already hit should_broadcast, I think reduce the fault-tolerance worthiness of the feature, as actually you're waiting for the next block to get in, and potentially congestion to have worsened.

Right, that's true in theory, but I don't think it matters in practice, specifically my point that in order for this to be the case in practice, both of these need to be true:

a) hit the "fees are spiking and our channel fee is now too low" code and (b) get into the next block - if you got into the next block the "our channel fee is now too low" criteria almost certainly shouldn't have been hit.

I don't think this code should/will ever trigger if the current channel fee qualifies for the next block. And if the current channel fee doesn't qualify for the next block, the whole debate seems moot to me?

ariard · 2021-12-27T22:28:00Z

Updated at c6ca774

Simple force-close of inbound HTLCs at fee spikes detection in should_broadcast_holder_commitment_txn. Fee spikes rate is configurable on a knob.

TheBlueMatt · 2021-12-28T01:09:11Z

lightning/src/chain/channelmonitor.rs

@@ -2662,7 +2679,7 @@ impl<Signer: Sign> ChannelMonitorImpl<Signer> {
 					//  with CHECK_CLTV_EXPIRY_SANITY_2.
 					let htlc_outbound = $holder_tx == htlc.offered;
 					if ( htlc_outbound && htlc.cltv_expiry + LATENCY_GRACE_PERIOD_BLOCKS <= height) ||
-					   (!htlc_outbound && htlc.cltv_expiry <= height + CLTV_CLAIM_BUFFER && self.payment_preimages.contains_key(&htlc.payment_hash)) {
+					   ((!htlc_outbound && (htlc.cltv_expiry <= height + CLTV_CLAIM_BUFFER || spikes_force_close)) && self.payment_preimages.contains_key(&htlc.payment_hash)) { //TODO: don't force-close if dust satoshis


Do we really want to force-close instantly if we only have one HTLC that expires in a day? I worry a bit about force-closes if a counterparty just uses a different fee estimator than us - fee estimators are really often 30+% off from each other.

Added a new get_fulfilled_htlc_sum() as a new condition to determine if we force-close the chan or not. It doesn't account for dust HTLC of course.

ariard · 2022-02-09T00:59:28Z

Rebased and completed at 5ae2cc7

Added test coverage.

ariard · 2022-02-09T01:05:07Z

lightning/src/chain/channelmonitor.rs

@@ -2443,7 +2463,8 @@ impl<Signer: Sign> ChannelMonitorImpl<Signer> {
 		log_trace!(logger, "Processing {} matched transactions for block at height {}.", txn_matched.len(), conf_height);
 		debug_assert!(self.best_block.height() >= conf_height);

-		let should_broadcast = self.should_broadcast_holder_commitment_txn(logger);
+		let fulfilled_htlc_high_exposure = 5_000_000 < self.get_fulfilled_htlc_sum();


Should we constify or make it a new config knob ?

I think we should give their say to node operators when it's question of risk evaluation. I pick up the same value than max_dust_htlc_exposure. Maybe we could pick one unified "funds at risk" value though at the same time it doesn't exactly cover the same risks...

Long-term, I think we should let node operators fine-tune their channels operations in function of their own risks analysis, instead of substituting to them (though likely too early).

ariard · 2022-03-28T22:58:42Z

Closing for now : https://gnusha.org/bitcoin-rust/2022-03-27.log

While i think reaction dynamic on mempools congestion is the way to go long-term, update_fee as a fee-bumping mechanism is too limited, and it could be confusing or even harmful to users if we're closing early in unfavorable scenarios . To be revisited once we have anchor + package relay.

TheBlueMatt added this to the 0.0.101 milestone Aug 23, 2021

valentinewallace reviewed Sep 8, 2021

View reviewed changes

TheBlueMatt modified the milestones: 0.0.101, 0.1 Sep 15, 2021

TheBlueMatt mentioned this pull request Sep 15, 2021

Auto-Close on update_fee delay #993

Closed

valentinewallace mentioned this pull request Sep 28, 2021

Check for outbound feerate update affordability before sending #1054

Merged

ariard force-pushed the 2021-08-autoclose-update-delay branch from 5ef98b9 to 2e3fd85 Compare October 21, 2021 23:16

ariard force-pushed the 2021-08-autoclose-update-delay branch from 2ceab8f to f64c46c Compare November 25, 2021 02:56

TheBlueMatt reviewed Nov 26, 2021

View reviewed changes

ariard force-pushed the 2021-08-autoclose-update-delay branch 2 times, most recently from 4048576 to 285b0ec Compare November 29, 2021 15:49

TheBlueMatt reviewed Nov 29, 2021

View reviewed changes

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

ariard force-pushed the 2021-08-autoclose-update-delay branch from eaabe00 to 8cce3cc Compare December 14, 2021 01:05

TheBlueMatt reviewed Dec 14, 2021

View reviewed changes

ariard force-pushed the 2021-08-autoclose-update-delay branch from 589389d to c6ca774 Compare December 27, 2021 22:26

TheBlueMatt reviewed Dec 28, 2021

View reviewed changes

Antoine Riard added 2 commits February 8, 2022 19:58

Add force-close inbound HTLC on fee spikes detection

5b27c71

Add test_fee_spikes_force_close

5ae2cc7

ariard force-pushed the 2021-08-autoclose-update-delay branch from c6ca774 to 5ae2cc7 Compare February 9, 2022 00:58

ariard commented Feb 9, 2022

View reviewed changes

ariard closed this Mar 28, 2022

TheBlueMatt mentioned this pull request May 6, 2024

Force-close channels if their feerate gets stale without any update #3037

Merged

Add Auto-close on inbound/outbound update_fee delay #1056

Add Auto-close on inbound/outbound update_fee delay #1056

Conversation

ariard commented Aug 22, 2021 • edited Loading

codecov bot commented Aug 22, 2021 • edited Loading

Codecov Report

TheBlueMatt commented Aug 23, 2021

valentinewallace left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ariard Oct 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt commented Sep 15, 2021

TheBlueMatt commented Nov 24, 2021

ariard commented Nov 25, 2021 • edited Loading

Choose a reason for hiding this comment

ariard Nov 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ariard commented Nov 29, 2021

TheBlueMatt commented Nov 30, 2021

ariard commented Dec 7, 2021

TheBlueMatt commented Dec 7, 2021

ariard commented Dec 13, 2021

codecov-commenter commented Dec 14, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ariard Dec 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ariard commented Dec 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ariard commented Feb 9, 2022

Choose a reason for hiding this comment

ariard commented Mar 28, 2022

ariard commented Aug 22, 2021 •

edited

Loading

codecov bot commented Aug 22, 2021 •

edited

Loading

ariard Oct 21, 2021 •

edited

Loading

ariard commented Nov 25, 2021 •

edited

Loading

ariard Nov 29, 2021 •

edited

Loading

codecov-commenter commented Dec 14, 2021 •

edited

Loading

ariard Dec 17, 2021 •

edited

Loading