-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Auto-close on inbound/outbound update_fee delay #1056
Add Auto-close on inbound/outbound update_fee delay #1056
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1056 +/- ##
==========================================
- Coverage 90.47% 90.44% -0.03%
==========================================
Files 69 69
Lines 37137 37190 +53
==========================================
+ Hits 33600 33637 +37
- Misses 3537 3553 +16
Continue to review full report at Codecov.
|
Meeting notes - @ariard is going to make this a bit more conservative (eg require the feerate needs to increase 25-50% and be higher than our background estimate before force-closing) as well as check what other implementations are doing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty reasonable! Will go more in-depth after the "require 25-50% feerate increase to force-close" updates
lightning/src/ln/channel.rs
Outdated
@@ -392,6 +398,10 @@ pub(super) struct Channel<Signer: Sign> { | |||
|
|||
latest_monitor_update_id: u64, | |||
|
|||
// Auto-close timer, if the channel is outbound, and we didn't receive a RAA for counterparty | |||
// commitment transaction after `AUTOCLOSE_TIMEOUT` periods, this channel must be force-closed. | |||
autoclose_timer: u16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems like it could be a u8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a u8 is only 4h of blocks (assuming 1min tick), if we make it configurable in the future and node operators wanna increase them we might have to update back to u16?
lightning/src/ln/channelmanager.rs
Outdated
@@ -2660,6 +2667,23 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana | |||
(retain_channel, NotifyOption::DoPersist, ret_err) | |||
} | |||
|
|||
fn check_channel_autoclose(&self, short_to_id: &mut HashMap<u64, [u8; 32]>, chan_id: &[u8; 32], chan: &mut Channel<Signer>) -> (bool, NotifyOption, Result<(), MsgHandleErrInternal>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level doc comment plz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one, let me know if good to you :) Channel::check_channel_autoclose
is more commented.
lightning/src/ln/channelmanager.rs
Outdated
// If the feerate has decreased by less than half, don't bother | ||
if new_feerate <= chan.get_feerate() && new_feerate * 2 > chan.get_feerate() { | ||
log_trace!(self.logger, "Channel {} does not qualify for a feerate change from {} to {}.", | ||
log_bytes!(chan_id[..]), chan.get_feerate(), new_feerate); | ||
return (true, NotifyOption::SkipPersist, Ok(())); | ||
} | ||
if !chan.is_outbound() { | ||
if chan.maybe_trigger_autoclose_timer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get why we start the autoclose timer here? Comment might help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure timely confirmations of its transactions, a Lightning node should always ensure those transactions commit to a compelling feerate. In case of mempool spikes or decongestion, the update_fee
mechanism allows the nodes to cooperatively adjust the channel's transactions feerate. If the update_fee
sequence doesn't succeed, a node should preemptively close their channels instead of bearing the risk of non-confirming commitment transactions with pending time-sensitive HTLC outputs.
In LDK, we run timer_tick_occured
as a background task, independently of the channel inbound/outboundness. One of the background task called is update_channel_fee
, where we already monitor if it's relevant to send a outbound update_fee. As such I think it's a good location to add the trigger for inbound, removing the return in beginning of update_channel_fee
.
That said, I think it's okay to merge the inbound/outbound code paths ? Like removing the maybe_trigger_autoclose_timer
from send_update_fee
.
edit: ah no because send_update_fee_and_commit
is a public method and we could re-use it from a different path rather than update_channel_fee
.
lightning/src/ln/channel.rs
Outdated
} | ||
if self.autoclose_timer == AUTOCLOSE_TIMEOUT { | ||
// If the channel doesn't have pending HTLC outputs to claim on-chain | ||
if self.pending_inbound_htlcs.len() + self.pending_outbound_htlcs.len() > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be resetting the timer in the else
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well not as simple. The HTLCs might have cleanup since timer starting but the feerate is still stalling. If a HTLC is added just after we reset the timer and the channel force-closes, confirmation might be delayed for a while, or even the transaction not propagating across the network mempools. The channel won't be force-close before the expiration of another AUTOCLOSE_TIMEOUT
.
If we assume that channels are more often loaded with HTLCs than empty, I think we should not reset the timer even if we don't have pending HTLCs currently now. Though maybe we should remove this force-close waiver or move to an economical one ("if HTLC exposure is under the feerate cost to force-lose, don't break the channel") ?
Moving to 0.1 as I'm not sure we should delay 0.0.101 or 0.0.102 for this. |
5ef98b9
to
2e3fd85
Compare
Needs rebase. |
2ceab8f
to
f64c46c
Compare
Rebased at f64c46c. Few changes since last time :
Especially happy to get opinions on a) Still have to add test coverages. |
lightning/src/util/config.rs
Outdated
/// If this is set to false, we do not auto-close channel with pending time-sensitive outputs, | ||
/// of which the feerate has been detected as stalling. | ||
/// Default value: true. | ||
pub should_autoclose: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should make this some kind of knob to configure how aggressively we do this instead? eg right now we require a feerate increase of 30%, maybe we should make that a knob and users who want to turn off autoclose can set it to u64::max_value()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What feerate we would like the commitment transaction to be committed with ? Ideally we would have visibility on the current mempool feerate groups composition to predict a worst-case evolution of those ones based on historical data. Assuming this worst-case, we should guarantee our feerate stay in-sync with the top feerate group for block inclusion during our CLTV_CLAIM_BUFFER
(once we unilateraly break the chain we can't update it anymore). The current "30%" should represent our rate of fidelity of the worst-case, if a node operator would like to diverge from the worst-case, they should tweak it accordingly.
Even further, to have a more complete model, we should also tweak AUTOCLOSE_TIMEOUT
to cover the velocity of the worst-case scenario ("Once we detected evolution towards unsafe mempool feerate groups, how fast we can react to correct the included feerate ?"). The issue being there, our reaction time is dependent on our counterparty interactivity latency, and there we could demand input from the node operator ("what's the liveliness reliability of this peer ?") to adjust in consequence AUTOCLOSE_TIMEOUT
. The current AUTOCLOSE_TIMEOUT
should be block-long enough to let our fee-estimator and our counterparty converge, but also short enough to match worst-case congestion velocity.
Ultimately the limitating safety factor is our counterparty latency. And that one can be quite extreme if our counterparty isn't robust against low-odds, high-impact risks (data center fire, DNS server outage, etc). Assuming package relay support and moving to anchor output as a fee-bumping mechanism remove this safety dependency on counterparty latency.
Or if you have a more rule-of-thumb thinking on wat to advice node operators to configure this potential knob ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a clear answer, sadly. Indeed, no fee estimator (that I'm aware of) is really tuned for the lightning threat model. That said, the answer is likely some function over the trust a user has for their counterparty, which leads me to believe it should be a knob. If you're running a node that opens channels automatically with any counterparty, you probably want something pretty tight (30% seems reasonable?), if you're a node that only makes outbound channel opens with trusted peers, maybe 500% is more reasonable.
lightning/src/ln/channel.rs
Outdated
@@ -2832,13 +2857,35 @@ impl<Signer: Sign> Channel<Signer> { | |||
} | |||
} | |||
|
|||
/// Trigger the autoclose timer if it's in the starting position | |||
pub fn maybe_trigger_autoclose_timer(&mut self, current_feerate: u32, new_feerate: u32, background_feerate: u32) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this just a part of send_update_fee
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is at least a call in update_channel_fee
to monitor stalling inbound update_fee ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but this method is indirectly called in timer tick (via update_channel_fee) but check_channel_autoclose
is called directly in timer tick. I still don't see a reason why they are separate, they're both called during timer ticks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've verified the paths. I see, you're thinking to just pass a boolean trigger_autoclose
to send_update_fee
down from update_channel_fee
. IIRC, my motivation to have a separate method to start the auto-close timer was to make the mechanism re-usable on conditions beyond feerate variations such as peer disconnection, fee-bumping reserve exhaustion, etc.
4048576
to
285b0ec
Compare
Updated at 285b0ec. Note, removed the check against |
One worry I have is what happens if we have an HTLC that times out in a day, but get no blocks for an hour and thus have a huge fee spike. One maybe simpler solution is to increase |
Updated at eaabe00 with inbound-with-pending-payment-restriction-only and new config
I agree we could make |
Don't you have a similar trade-off with this PR as-is, though? In either case, if your fee estimator thinks the feerate needs to be higher, we'll force-close, even if the fee-estimator is garbage. At least moving this logic to the I feel like doing it in that context also allows us a bit more specificity because we already have code around HTLC status there so its less total new code and we can bound stuff as we want. |
From online meeting : "accelerate the force-close of inbound HTLCs in case of slow-block issuance and feerate spikes, compress the block-timer in ChannelMonitor's |
eaabe00
to
8cce3cc
Compare
Codecov Report
@@ Coverage Diff @@
## main #1056 +/- ##
==========================================
+ Coverage 90.51% 92.22% +1.71%
==========================================
Files 71 69 -2
Lines 39012 47480 +8468
==========================================
+ Hits 35310 43787 +8477
+ Misses 3702 3693 -9
Continue to review full report at Codecov.
|
@@ -1925,6 +1931,30 @@ impl<Signer: Sign> ChannelMonitorImpl<Signer> { | |||
panic!("Attempted to replace shutdown script {} with {}", shutdown_script, scriptpubkey); | |||
} | |||
}, | |||
ChannelMonitorUpdateStep::FeesSpikes { feerate } => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need a new Update type? Can't we just check the fee in ChannelMonitor when we call should_broadcast...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you still need a tick-based timer against which to build the staling block detection ? Could be in ChannelMonitor, but we need to add a new pub timer_tick_occured
there, I think.
Otherwise what do you have in mind to detect block staling, or maybe you're thinking to compress HTLC_CLAIM_BUFFER
on fees variations only ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I'm confused, I thought we were roughly on the same page that we don't need to do this on a tick but can wait for the next block? There's a lot of complexity here that all goes away if we just wait for the next block, and I'm really unconvinced you're ever going to (a) hit the "fees are spiking and our channel fee is now too low" code and (b) get into the next block - if you got into the next block the "our channel fee is now too low" criteria almost certainly shouldn't have been hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I'm confused, I thought we were roughly on the same page that we don't need to do this on a tick but can wait for the next block?
Well, I'm confused too, I've noted "slow block issuance" as a criteria from meeting. And in that sense, I've implemented the timer as a poorman heuristic to detect slow block issuance. What I understand, is you suggest to rely on the fee spikes detection, as reported by our fee-estimator, to decide to compess HTLC_CLAIM_BUFFER
. If you assume to do it when we already hit should_broadcast
, I think reduce the fault-tolerance worthiness of the feature, as actually you're waiting for the next block to get in, and potentially congestion to have worsened.
I'm good with that, though a) it should be knob-configurable to disable that mechanism if you trust your peers to be online and as thus reactive to your update_fee
and b) a deprecation note to remove this mechanism once package relay is supported as it's kind of automatic-closure mechanism triggerable by a third-party owning your fee-estimator (sadly, a realistic assumption for our target phone users)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you assume to do it when we already hit should_broadcast, I think reduce the fault-tolerance worthiness of the feature, as actually you're waiting for the next block to get in, and potentially congestion to have worsened.
Right, that's true in theory, but I don't think it matters in practice, specifically my point that in order for this to be the case in practice, both of these need to be true:
a) hit the "fees are spiking and our channel fee is now too low" code and (b) get into the next block - if you got into the next block the "our channel fee is now too low" criteria almost certainly shouldn't have been hit.
I don't think this code should/will ever trigger if the current channel fee qualifies for the next block. And if the current channel fee doesn't qualify for the next block, the whole debate seems moot to me?
589389d
to
c6ca774
Compare
Updated at c6ca774 Simple force-close of inbound HTLCs at fee spikes detection in |
@@ -2662,7 +2679,7 @@ impl<Signer: Sign> ChannelMonitorImpl<Signer> { | |||
// with CHECK_CLTV_EXPIRY_SANITY_2. | |||
let htlc_outbound = $holder_tx == htlc.offered; | |||
if ( htlc_outbound && htlc.cltv_expiry + LATENCY_GRACE_PERIOD_BLOCKS <= height) || | |||
(!htlc_outbound && htlc.cltv_expiry <= height + CLTV_CLAIM_BUFFER && self.payment_preimages.contains_key(&htlc.payment_hash)) { | |||
((!htlc_outbound && (htlc.cltv_expiry <= height + CLTV_CLAIM_BUFFER || spikes_force_close)) && self.payment_preimages.contains_key(&htlc.payment_hash)) { //TODO: don't force-close if dust satoshis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to force-close instantly if we only have one HTLC that expires in a day? I worry a bit about force-closes if a counterparty just uses a different fee estimator than us - fee estimators are really often 30+% off from each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new get_fulfilled_htlc_sum()
as a new condition to determine if we force-close the chan or not. It doesn't account for dust HTLC of course.
c6ca774
to
5ae2cc7
Compare
Rebased and completed at 5ae2cc7 Added test coverage. |
@@ -2443,7 +2463,8 @@ impl<Signer: Sign> ChannelMonitorImpl<Signer> { | |||
log_trace!(logger, "Processing {} matched transactions for block at height {}.", txn_matched.len(), conf_height); | |||
debug_assert!(self.best_block.height() >= conf_height); | |||
|
|||
let should_broadcast = self.should_broadcast_holder_commitment_txn(logger); | |||
let fulfilled_htlc_high_exposure = 5_000_000 < self.get_fulfilled_htlc_sum(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we constify or make it a new config knob ?
I think we should give their say to node operators when it's question of risk evaluation. I pick up the same value than max_dust_htlc_exposure
. Maybe we could pick one unified "funds at risk" value though at the same time it doesn't exactly cover the same risks...
Long-term, I think we should let node operators fine-tune their channels operations in function of their own risks analysis, instead of substituting to them (though likely too early).
Closing for now : https://gnusha.org/bitcoin-rust/2022-03-27.log While i think reaction dynamic on mempools congestion is the way to go long-term, |
Auto-close timer, if the channel is outbound, we sent a
update_fee
, and we didn'treceive a RAA from counterparty committing this state after
AUTOCLOSE_TIMEOUT
periods,this channel must be force-closed.
If the channel is inbound, it has been observed the channel feerate should increase,
and we didn't receive a RAA from counterparty committing an
update_fee
afterAUTOCLOSE_TIMEOUT
periods, this channel must be force-closed.Close #993
update_fee
, verify it's going upAUTOCLOSE_TIMEOUT
value