Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Auto-close on inbound/outbound update_fee delay #1056

Closed

Conversation

ariard
Copy link

@ariard ariard commented Aug 22, 2021

Auto-close timer, if the channel is outbound, we sent a update_fee, and we didn't
receive a RAA from counterparty committing this state after AUTOCLOSE_TIMEOUT periods,
this channel must be force-closed.
If the channel is inbound, it has been observed the channel feerate should increase,
and we didn't receive a RAA from counterparty committing an update_fee after
AUTOCLOSE_TIMEOUT periods, this channel must be force-closed.

Close #993

  • test coverage
  • distrust inbound update_fee, verify it's going up
  • if feerates are going downwards, do not trigger timers
  • verify other implems timers to fine-tune AUTOCLOSE_TIMEOUT value

@codecov
Copy link

codecov bot commented Aug 22, 2021

Codecov Report

Merging #1056 (285b0ec) into main (1a74367) will decrease coverage by 0.02%.
The diff coverage is 75.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1056      +/-   ##
==========================================
- Coverage   90.47%   90.44%   -0.03%     
==========================================
  Files          69       69              
  Lines       37137    37190      +53     
==========================================
+ Hits        33600    33637      +37     
- Misses       3537     3553      +16     
Impacted Files Coverage Δ
lightning/src/util/config.rs 45.83% <50.00%> (+0.18%) ⬆️
lightning/src/ln/channelmanager.rs 83.66% <68.18%> (-0.11%) ⬇️
lightning/src/ln/channel.rs 88.95% <81.25%> (-0.09%) ⬇️
lightning/src/ln/functional_tests.rs 97.38% <0.00%> (-0.04%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1a74367...285b0ec. Read the comment docs.

@TheBlueMatt TheBlueMatt added this to the 0.0.101 milestone Aug 23, 2021
@TheBlueMatt
Copy link
Collaborator

Meeting notes - @ariard is going to make this a bit more conservative (eg require the feerate needs to increase 25-50% and be higher than our background estimate before force-closing) as well as check what other implementations are doing.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty reasonable! Will go more in-depth after the "require 25-50% feerate increase to force-close" updates

@@ -392,6 +398,10 @@ pub(super) struct Channel<Signer: Sign> {

latest_monitor_update_id: u64,

// Auto-close timer, if the channel is outbound, and we didn't receive a RAA for counterparty
// commitment transaction after `AUTOCLOSE_TIMEOUT` periods, this channel must be force-closed.
autoclose_timer: u16,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems like it could be a u8

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a u8 is only 4h of blocks (assuming 1min tick), if we make it configurable in the future and node operators wanna increase them we might have to update back to u16?

@@ -2660,6 +2667,23 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
(retain_channel, NotifyOption::DoPersist, ret_err)
}

fn check_channel_autoclose(&self, short_to_id: &mut HashMap<u64, [u8; 32]>, chan_id: &[u8; 32], chan: &mut Channel<Signer>) -> (bool, NotifyOption, Result<(), MsgHandleErrInternal>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level doc comment plz

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one, let me know if good to you :) Channel::check_channel_autoclose is more commented.

// If the feerate has decreased by less than half, don't bother
if new_feerate <= chan.get_feerate() && new_feerate * 2 > chan.get_feerate() {
log_trace!(self.logger, "Channel {} does not qualify for a feerate change from {} to {}.",
log_bytes!(chan_id[..]), chan.get_feerate(), new_feerate);
return (true, NotifyOption::SkipPersist, Ok(()));
}
if !chan.is_outbound() {
if chan.maybe_trigger_autoclose_timer() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get why we start the autoclose timer here? Comment might help

Copy link
Author

@ariard ariard Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure timely confirmations of its transactions, a Lightning node should always ensure those transactions commit to a compelling feerate. In case of mempool spikes or decongestion, the update_fee mechanism allows the nodes to cooperatively adjust the channel's transactions feerate. If the update_fee sequence doesn't succeed, a node should preemptively close their channels instead of bearing the risk of non-confirming commitment transactions with pending time-sensitive HTLC outputs.

In LDK, we run timer_tick_occured as a background task, independently of the channel inbound/outboundness. One of the background task called is update_channel_fee, where we already monitor if it's relevant to send a outbound update_fee. As such I think it's a good location to add the trigger for inbound, removing the return in beginning of update_channel_fee.

That said, I think it's okay to merge the inbound/outbound code paths ? Like removing the maybe_trigger_autoclose_timer from send_update_fee.

edit: ah no because send_update_fee_and_commit is a public method and we could re-use it from a different path rather than update_channel_fee.

}
if self.autoclose_timer == AUTOCLOSE_TIMEOUT {
// If the channel doesn't have pending HTLC outputs to claim on-chain
if self.pending_inbound_htlcs.len() + self.pending_outbound_htlcs.len() > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be resetting the timer in the else case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well not as simple. The HTLCs might have cleanup since timer starting but the feerate is still stalling. If a HTLC is added just after we reset the timer and the channel force-closes, confirmation might be delayed for a while, or even the transaction not propagating across the network mempools. The channel won't be force-close before the expiration of another AUTOCLOSE_TIMEOUT.

If we assume that channels are more often loaded with HTLCs than empty, I think we should not reset the timer even if we don't have pending HTLCs currently now. Though maybe we should remove this force-close waiver or move to an economical one ("if HTLC exposure is under the feerate cost to force-lose, don't break the channel") ?

@TheBlueMatt TheBlueMatt modified the milestones: 0.0.101, 0.1 Sep 15, 2021
@TheBlueMatt
Copy link
Collaborator

Moving to 0.1 as I'm not sure we should delay 0.0.101 or 0.0.102 for this.

@TheBlueMatt
Copy link
Collaborator

Needs rebase.

@ariard ariard force-pushed the 2021-08-autoclose-update-delay branch from 2ceab8f to f64c46c Compare November 25, 2021 02:56
@ariard
Copy link
Author

ariard commented Nov 25, 2021

Rebased at f64c46c.

Few changes since last time :

  • a new should_autoclose config value has been added to ChannelConfig (default: true), as the channel might have been opened with a trusted, always-online peer, thus making on-chain unilateral close irrelevant
  • AUTOCLOSE_TIMEOUT has been scaled to 60 min (max lnd value), high enough to avoid closures with floppy peers but low enough to still have odds of block inclusion
  • force-close are conditional on included htlcs being present, don't go on-chain for dust HTLCs
  • start a auto-close timer if we fail outbound update fee affordance (addressing Check for outbound feerate update affordability before sending #1054 (comment))

Especially happy to get opinions on a) AUTOCLOSE_TIMEOUT and b) sanitizing the inbound update fee being effectively upward in a given range (e.g 30%) to avoid malicious peers doing the strict minimum.

Still have to add test coverages.

/// If this is set to false, we do not auto-close channel with pending time-sensitive outputs,
/// of which the feerate has been detected as stalling.
/// Default value: true.
pub should_autoclose: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make this some kind of knob to configure how aggressively we do this instead? eg right now we require a feerate increase of 30%, maybe we should make that a knob and users who want to turn off autoclose can set it to u64::max_value()?

Copy link
Author

@ariard ariard Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What feerate we would like the commitment transaction to be committed with ? Ideally we would have visibility on the current mempool feerate groups composition to predict a worst-case evolution of those ones based on historical data. Assuming this worst-case, we should guarantee our feerate stay in-sync with the top feerate group for block inclusion during our CLTV_CLAIM_BUFFER (once we unilateraly break the chain we can't update it anymore). The current "30%" should represent our rate of fidelity of the worst-case, if a node operator would like to diverge from the worst-case, they should tweak it accordingly.

Even further, to have a more complete model, we should also tweak AUTOCLOSE_TIMEOUT to cover the velocity of the worst-case scenario ("Once we detected evolution towards unsafe mempool feerate groups, how fast we can react to correct the included feerate ?"). The issue being there, our reaction time is dependent on our counterparty interactivity latency, and there we could demand input from the node operator ("what's the liveliness reliability of this peer ?") to adjust in consequence AUTOCLOSE_TIMEOUT. The current AUTOCLOSE_TIMEOUT should be block-long enough to let our fee-estimator and our counterparty converge, but also short enough to match worst-case congestion velocity.

Ultimately the limitating safety factor is our counterparty latency. And that one can be quite extreme if our counterparty isn't robust against low-odds, high-impact risks (data center fire, DNS server outage, etc). Assuming package relay support and moving to anchor output as a fee-bumping mechanism remove this safety dependency on counterparty latency.

Or if you have a more rule-of-thumb thinking on wat to advice node operators to configure this potential knob ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a clear answer, sadly. Indeed, no fee estimator (that I'm aware of) is really tuned for the lightning threat model. That said, the answer is likely some function over the trust a user has for their counterparty, which leads me to believe it should be a knob. If you're running a node that opens channels automatically with any counterparty, you probably want something pretty tight (30% seems reasonable?), if you're a node that only makes outbound channel opens with trusted peers, maybe 500% is more reasonable.

@@ -2832,13 +2857,35 @@ impl<Signer: Sign> Channel<Signer> {
}
}

/// Trigger the autoclose timer if it's in the starting position
pub fn maybe_trigger_autoclose_timer(&mut self, current_feerate: u32, new_feerate: u32, background_feerate: u32) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this just a part of send_update_fee?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is at least a call in update_channel_fee to monitor stalling inbound update_fee ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but this method is indirectly called in timer tick (via update_channel_fee) but check_channel_autoclose is called directly in timer tick. I still don't see a reason why they are separate, they're both called during timer ticks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've verified the paths. I see, you're thinking to just pass a boolean trigger_autoclose to send_update_fee down from update_channel_fee. IIRC, my motivation to have a separate method to start the auto-close timer was to make the mechanism re-usable on conditions beyond feerate variations such as peer disconnection, fee-bumping reserve exhaustion, etc.

@ariard ariard force-pushed the 2021-08-autoclose-update-delay branch 2 times, most recently from 4048576 to 285b0ec Compare November 29, 2021 15:49
@ariard
Copy link
Author

ariard commented Nov 29, 2021

Updated at 285b0ec.

Note, removed the check against background_feerate in maybe_trigger_autoclose. Verifying the new feerate is above 30% of current channel feerate should be enough as fee upward detection check.

@TheBlueMatt
Copy link
Collaborator

One worry I have is what happens if we have an HTLC that times out in a day, but get no blocks for an hour and thus have a huge fee spike. One maybe simpler solution is to increase CLTV_CLAIM_BUFFER and LATENCY_GRACE_PERIOD_BLOCKS as used in should_broadcast_holder_commitment_tx by some ratio of how much feerate has increased compared to the latest holder commitment tx.

@ariard
Copy link
Author

ariard commented Dec 7, 2021

Updated at eaabe00 with inbound-with-pending-payment-restriction-only and new config autoclose_variation_rate.

One maybe simpler solution is to increase CLTV_CLAIM_BUFFER and LATENCY_GRACE_PERIOD_BLOCKS as used in should_broadcast_holder_commitment_tx by some ratio of how much feerate has increased compared to the latest holder commitment tx.

I agree we could make CLTV_CLAIM_BUFFER and LATENCY_GRACE_PERIOD_BLOCKS dependent on fee spikes, extending them to provoke earlier closure of the channel and as such increasing the odds of transaction confirmation. Or we could also increase them in case of fee-bumping reserve exhaustion in the anchor output model. That said, I think there is another trade-off, if your fee-estimator gets compromised or unreliable, your channels are becoming far more frail, it might be the kind of mechanism where you don't want all the nodes behaving the same to avoid ecosystem-wide threshold effect.

@TheBlueMatt
Copy link
Collaborator

That said, I think there is another trade-off, if your fee-estimator gets compromised or unreliable, your channels are becoming far more frail, it might be the kind of mechanism where you don't want all the nodes behaving the same to avoid ecosystem-wide threshold effect.

Don't you have a similar trade-off with this PR as-is, though? In either case, if your fee estimator thinks the feerate needs to be higher, we'll force-close, even if the fee-estimator is garbage. At least moving this logic to the ChannelMonitor keeps our "should we force-close due to HTLC expiry" code in one place, and ensures a redundant ChannelMonitor can make the same decision.

I feel like doing it in that context also allows us a bit more specificity because we already have code around HTLC status there so its less total new code and we can bound stuff as we want.

@ariard
Copy link
Author

ariard commented Dec 13, 2021

From online meeting : "accelerate the force-close of inbound HTLCs in case of slow-block issuance and feerate spikes, compress the block-timer in ChannelMonitor's should_broadcast().

@ariard ariard force-pushed the 2021-08-autoclose-update-delay branch from eaabe00 to 8cce3cc Compare December 14, 2021 01:05
@codecov-commenter
Copy link

codecov-commenter commented Dec 14, 2021

Codecov Report

Merging #1056 (589389d) into main (d29ae18) will increase coverage by 1.71%.
The diff coverage is 64.78%.

❗ Current head 589389d differs from pull request most recent head 5ae2cc7. Consider uploading reports for the commit 5ae2cc7 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1056      +/-   ##
==========================================
+ Coverage   90.51%   92.22%   +1.71%     
==========================================
  Files          71       69       -2     
  Lines       39012    47480    +8468     
==========================================
+ Hits        35310    43787    +8477     
+ Misses       3702     3693       -9     
Impacted Files Coverage Δ
lightning/src/chain/channelmonitor.rs 92.16% <25.80%> (+0.59%) ⬆️
lightning/src/ln/channel.rs 93.09% <91.66%> (+3.80%) ⬆️
lightning/src/ln/channelmanager.rs 87.35% <100.00%> (+2.88%) ⬆️
lightning/src/routing/scoring.rs 92.80% <0.00%> (-2.23%) ⬇️
lightning/src/util/logger.rs 84.37% <0.00%> (-1.05%) ⬇️
lightning/src/util/chacha20.rs 95.37% <0.00%> (-1.00%) ⬇️
lightning-invoice/src/utils.rs 83.48% <0.00%> (-0.59%) ⬇️
lightning/src/chain/package.rs 92.26% <0.00%> (-0.35%) ⬇️
lightning-invoice/src/de.rs 80.94% <0.00%> (-0.34%) ⬇️
lightning/src/ln/onion_utils.rs 95.38% <0.00%> (-0.26%) ⬇️
... and 33 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d29ae18...5ae2cc7. Read the comment docs.

@@ -1925,6 +1931,30 @@ impl<Signer: Sign> ChannelMonitorImpl<Signer> {
panic!("Attempted to replace shutdown script {} with {}", shutdown_script, scriptpubkey);
}
},
ChannelMonitorUpdateStep::FeesSpikes { feerate } => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need a new Update type? Can't we just check the fee in ChannelMonitor when we call should_broadcast...?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you still need a tick-based timer against which to build the staling block detection ? Could be in ChannelMonitor, but we need to add a new pub timer_tick_occured there, I think.

Otherwise what do you have in mind to detect block staling, or maybe you're thinking to compress HTLC_CLAIM_BUFFER on fees variations only ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I'm confused, I thought we were roughly on the same page that we don't need to do this on a tick but can wait for the next block? There's a lot of complexity here that all goes away if we just wait for the next block, and I'm really unconvinced you're ever going to (a) hit the "fees are spiking and our channel fee is now too low" code and (b) get into the next block - if you got into the next block the "our channel fee is now too low" criteria almost certainly shouldn't have been hit.

Copy link
Author

@ariard ariard Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I'm confused, I thought we were roughly on the same page that we don't need to do this on a tick but can wait for the next block?

Well, I'm confused too, I've noted "slow block issuance" as a criteria from meeting. And in that sense, I've implemented the timer as a poorman heuristic to detect slow block issuance. What I understand, is you suggest to rely on the fee spikes detection, as reported by our fee-estimator, to decide to compess HTLC_CLAIM_BUFFER. If you assume to do it when we already hit should_broadcast, I think reduce the fault-tolerance worthiness of the feature, as actually you're waiting for the next block to get in, and potentially congestion to have worsened.

I'm good with that, though a) it should be knob-configurable to disable that mechanism if you trust your peers to be online and as thus reactive to your update_fee and b) a deprecation note to remove this mechanism once package relay is supported as it's kind of automatic-closure mechanism triggerable by a third-party owning your fee-estimator (sadly, a realistic assumption for our target phone users)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you assume to do it when we already hit should_broadcast, I think reduce the fault-tolerance worthiness of the feature, as actually you're waiting for the next block to get in, and potentially congestion to have worsened.

Right, that's true in theory, but I don't think it matters in practice, specifically my point that in order for this to be the case in practice, both of these need to be true:

a) hit the "fees are spiking and our channel fee is now too low" code and (b) get into the next block - if you got into the next block the "our channel fee is now too low" criteria almost certainly shouldn't have been hit.

I don't think this code should/will ever trigger if the current channel fee qualifies for the next block. And if the current channel fee doesn't qualify for the next block, the whole debate seems moot to me?

@ariard ariard force-pushed the 2021-08-autoclose-update-delay branch from 589389d to c6ca774 Compare December 27, 2021 22:26
@ariard
Copy link
Author

ariard commented Dec 27, 2021

Updated at c6ca774

Simple force-close of inbound HTLCs at fee spikes detection in should_broadcast_holder_commitment_txn. Fee spikes rate is configurable on a knob.

@@ -2662,7 +2679,7 @@ impl<Signer: Sign> ChannelMonitorImpl<Signer> {
// with CHECK_CLTV_EXPIRY_SANITY_2.
let htlc_outbound = $holder_tx == htlc.offered;
if ( htlc_outbound && htlc.cltv_expiry + LATENCY_GRACE_PERIOD_BLOCKS <= height) ||
(!htlc_outbound && htlc.cltv_expiry <= height + CLTV_CLAIM_BUFFER && self.payment_preimages.contains_key(&htlc.payment_hash)) {
((!htlc_outbound && (htlc.cltv_expiry <= height + CLTV_CLAIM_BUFFER || spikes_force_close)) && self.payment_preimages.contains_key(&htlc.payment_hash)) { //TODO: don't force-close if dust satoshis
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to force-close instantly if we only have one HTLC that expires in a day? I worry a bit about force-closes if a counterparty just uses a different fee estimator than us - fee estimators are really often 30+% off from each other.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new get_fulfilled_htlc_sum() as a new condition to determine if we force-close the chan or not. It doesn't account for dust HTLC of course.

@ariard ariard force-pushed the 2021-08-autoclose-update-delay branch from c6ca774 to 5ae2cc7 Compare February 9, 2022 00:58
@ariard
Copy link
Author

ariard commented Feb 9, 2022

Rebased and completed at 5ae2cc7

Added test coverage.

@@ -2443,7 +2463,8 @@ impl<Signer: Sign> ChannelMonitorImpl<Signer> {
log_trace!(logger, "Processing {} matched transactions for block at height {}.", txn_matched.len(), conf_height);
debug_assert!(self.best_block.height() >= conf_height);

let should_broadcast = self.should_broadcast_holder_commitment_txn(logger);
let fulfilled_htlc_high_exposure = 5_000_000 < self.get_fulfilled_htlc_sum();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we constify or make it a new config knob ?

I think we should give their say to node operators when it's question of risk evaluation. I pick up the same value than max_dust_htlc_exposure. Maybe we could pick one unified "funds at risk" value though at the same time it doesn't exactly cover the same risks...

Long-term, I think we should let node operators fine-tune their channels operations in function of their own risks analysis, instead of substituting to them (though likely too early).

@ariard
Copy link
Author

ariard commented Mar 28, 2022

Closing for now : https://gnusha.org/bitcoin-rust/2022-03-27.log

While i think reaction dynamic on mempools congestion is the way to go long-term, update_fee as a fee-bumping mechanism is too limited, and it could be confusing or even harmful to users if we're closing early in unfavorable scenarios . To be revisited once we have anchor + package relay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Auto-Close on update_fee delay
4 participants