-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix outbound payments memory leak on buggy router #3531
Fix outbound payments memory leak on buggy router #3531
Conversation
89a6131
to
b2bbb9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
but continue storing the pending outbound payment forever.
Hmm, I wonder if there is any way to detect these cases in retrospect? I.e., could we prune them in remove_stale_payments
so that they are dropped after upgrade and not linger forever in the storage of users that happened to hit the bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
We need to stop passing this Vec by value for the next commit so we can pass it to a different method.
b2bbb9a
to
35122f1
Compare
@tnull I did consider this, it didn't seem worth it at the time because it means introducing extra code for a weird/unlikely case 🤔 One option would be to remove long-expired |
If we're positive this likely really never happened in production, it's probably not worth it I guess. |
It looks like a user could have tried to pay an invoice with the MPP feature but without a payment secret and we could have hit this leak? I'm not sure that means we need to run the expiry backfill, though...if we do it probably means we'll want to run it on deserialization against the actual pending HTLCs-in-flight list, rather than just looking at the outbound payments in isolation. |
Definitely fine by me to fix it if y'all prefer, though I'm preeetty hesitant to add more complexity to |
Prior to this patch, if we attempted to send a payment or probe to a buggy route, we would error but continue storing the pending outbound payment forever. Attempts to retry would result in a “duplicate payment” error. In the case of ChannelManager::send_payment, we would also fail to generate a PaymentFailed event, even if the user manually called abandon_payment. This bug is unlikely to have ever been hit in the wild as most users use LDK’s router. Discovered in the course of adding a new send_to_route API. Now, we’ll properly generate events and remove the outbound from storage.
35122f1
to
b2269f4
Compare
Squashed in a macro -> function and linting fix: diff --git a/lightning/src/ln/outbound_payment.rs b/lightning/src/ln/outbound_payment.rs
index d7e6b1869..8517d599c 100644
--- a/lightning/src/ln/outbound_payment.rs
+++ b/lightning/src/ln/outbound_payment.rs
@@ -1458,19 +1458,6 @@ impl OutboundPayments {
IH: Fn() -> InFlightHtlcs,
SP: Fn(SendAlongPathArgs) -> Result<(), APIError>,
{
- macro_rules! remove_session_privs {
- () => {
- if let Some(payment) = self.pending_outbound_payments.lock().unwrap().get_mut(&payment_id) {
- for (path, session_priv_bytes) in route.paths.iter().zip(onion_session_privs.into_iter()) {
- let removed = payment.remove(&session_priv_bytes, Some(path));
- debug_assert!(removed, "This can't happen as the payment has an entry for this path added by callers");
- }
- } else {
- debug_assert!(false, "This can't happen as the payment was added by callers");
- }
- }
- }
-
match err {
PaymentSendFailure::AllFailedResendSafe(errs) => {
Self::push_path_failed_evs_and_scids(payment_id, payment_hash, &mut route_params, route.paths, errs.into_iter().map(|e| Err(e)), logger, pending_events);
@@ -1490,13 +1477,13 @@ impl OutboundPayments {
},
PaymentSendFailure::PathParameterError(results) => {
log_error!(logger, "Failed to send to route due to parameter error in a single path. Your router is buggy");
- remove_session_privs!();
+ self.remove_session_privs(payment_id, &route, onion_session_privs);
Self::push_path_failed_evs_and_scids(payment_id, payment_hash, &mut route_params, route.paths, results.into_iter(), logger, pending_events);
self.abandon_payment(payment_id, PaymentFailureReason::UnexpectedError, pending_events);
},
PaymentSendFailure::ParameterError(e) => {
log_error!(logger, "Failed to send to route due to parameter error: {:?}. Your router is buggy", e);
- remove_session_privs!();
+ self.remove_session_privs(payment_id, &route, onion_session_privs);
self.abandon_payment(payment_id, PaymentFailureReason::UnexpectedError, pending_events);
},
PaymentSendFailure::DuplicatePayment => debug_assert!(false), // unreachable
@@ -1536,6 +1523,21 @@ impl OutboundPayments {
}
}
+ // If a payment fails after adding the pending payment but before any HTLCs are locked into
+ // channels, we need to clear the session_privs in order for abandoning the payment to succeed.
+ fn remove_session_privs(
+ &self, payment_id: PaymentId, route: &Route, onion_session_privs: Vec<[u8; 32]>
+ ) {
+ if let Some(payment) = self.pending_outbound_payments.lock().unwrap().get_mut(&payment_id) {
+ for (path, session_priv_bytes) in route.paths.iter().zip(onion_session_privs.into_iter()) {
+ let removed = payment.remove(&session_priv_bytes, Some(path));
+ debug_assert!(removed, "This can't happen as the payment has an entry for this path added by callers");
+ }
+ } else {
+ debug_assert!(false, "This can't happen as the payment was added by callers");
+ }
+ }
+
pub(super) fn send_probe<ES: Deref, NS: Deref, F>(
&self, path: Path, probing_cookie_secret: [u8; 32], entropy_source: &ES, node_signer: &NS,
best_block_height: u32, send_payment_along_path: F
@@ -1809,7 +1811,7 @@ impl OutboundPayments {
let cur_height = best_block_height + 1;
let mut results = Vec::new();
debug_assert_eq!(route.paths.len(), onion_session_privs.len());
- for (path, session_priv_bytes) in route.paths.iter().zip(onion_session_privs.into_iter()) {
+ for (path, session_priv_bytes) in route.paths.iter().zip(onion_session_privs.iter()) {
let mut path_res = send_payment_along_path(SendAlongPathArgs {
path: &path, payment_hash: &payment_hash, recipient_onion, total_value,
cur_height, payment_id, keysend_preimage: &keysend_preimage, invoice_request, |
2664cfb
to
9311fae
Compare
9311fae
to
f1af495
Compare
// While a MonitorUpdateInProgress is an Err(_), the payment is still | ||
// considered "in flight" and we shouldn't remove it from the | ||
// PendingOutboundPayment set. | ||
Ok(_) | Err(APIError::MonitorUpdateInProgress) => None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like unfortunately all tests pass with this diff:
diff --git a/lightning/src/ln/outbound_payment.rs b/lightning/src/ln/outbound_payment.rs
index 68d4125a3..430fb166c 100644
--- a/lightning/src/ln/outbound_payment.rs
+++ b/lightning/src/ln/outbound_payment.rs
@@ -1454,7 +1454,8 @@ impl OutboundPayments {
// While a MonitorUpdateInProgress is an Err(_), the payment is still
// considered "in flight" and we shouldn't remove it from the
// PendingOutboundPayment set.
- Ok(_) | Err(APIError::MonitorUpdateInProgress) => None,
+ Ok(_) => None,
_ => Some((path, session_priv))
}
});
Even in the follow-up #3534 when rebased on this PR. Seems pre-existing though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
f1af495
to
3be8200
Compare
Added a debug_assert and renamed a variable: diff --git a/lightning/src/ln/outbound_payment.rs b/lightning/src/ln/outbound_payment.rs
index 98efd7b5d..8de47c4ae 100644
--- a/lightning/src/ln/outbound_payment.rs
+++ b/lightning/src/ln/outbound_payment.rs
@@ -1465,7 +1465,8 @@ impl OutboundPayments {
self.find_route_and_send_payment(payment_hash, payment_id, route_params, router, first_hops, inflight_htlcs, entropy_source, node_signer, best_block_height, logger, pending_events, send_payment_along_path);
},
PaymentSendFailure::PartialFailure { failed_paths_retry: Some(mut retry), results, .. } => {
- let filtered = results.iter().zip(route.paths.iter().zip(onion_session_privs.iter()))
+ debug_assert_eq!(route.paths.len(), onion_session_privs.len());
+ let failed_paths = results.iter().zip(route.paths.iter().zip(onion_session_privs.iter()))
.filter_map(|(path_res, (path, session_priv))| {
match path_res {
// While a MonitorUpdateInProgress is an Err(_), the payment is still
@@ -1475,7 +1476,7 @@ impl OutboundPayments {
_ => Some((path, session_priv))
}
});
- self.remove_session_privs(payment_id, filtered);
+ self.remove_session_privs(payment_id, failed_paths);
Self::push_path_failed_evs_and_scids(payment_id, payment_hash, &mut retry, route.paths, results.into_iter(), logger, pending_events);
// Some paths were sent, even if we failed to send the full MPP value our recipient may
// misbehave and claim the funds, at which point we have to consider the payment sent, so |
Ah wait, need one more debug_assert. Edit: done |
When an outbound payment fails while paying to a route, we need to remove the session_privs for each failed path in the outbound payment. Previously we were sometimes removing in pay_route_internal and sometimes in handle_pay_route_err, so refactor this so we always remove in handle_pay_route_err.
3be8200
to
e479317
Compare
Backported in #3567 |
v0.1.1 - Jan 28, 2025 - "Onchain Matters" API Updates =========== * A `ChannelManager::send_payment_with_route` was (re-)added, with semantics similar to `ChannelManager::send_payment` (rather than like the pre-0.1 `send_payent_with_route`, lightningdevkit#3534). * `RawBolt11Invoice::{to,from}_raw` were added (lightningdevkit#3549). Bug Fixes ========= * HTLCs which were forwarded where the inbound edge times out within the next three blocks will have the inbound HTLC failed backwards irrespective of the status of the outbound HTLC. This avoids the peer force-closing the channel (and claiming the inbound edge HTLC on-chain) even if we have not yet managed to claim the outbound edge on chain (lightningdevkit#3556). * On restart, replay of `Event::SpendableOutput`s could have caused `OutputSweeper` to generate double-spending transactions, making it unable to claim any delayed claims. This was resolved by retaining old claims for more than four weeks after they are claimed on-chain to detect replays (lightningdevkit#3559). * Fixed the additional feerate we will pay each time we RBF on-chain claims to match the Bitcoin Core policy (1 sat/vB) instead of 16 sats/vB (lightningdevkit#3457). * Fixed a cased where a custom `Router` which returns an invalid `Route`, provided to `ChannelManager`, can result in an outbound payment remaining pending forever despite no HTLCs being pending (lightningdevkit#3531). Security ======== 0.1.1 fixes a denial-of-service vulnerability allowing channel counterparties to cause force-closure of unrelated channels. * If a malicious channel counterparty force-closes a channel, broadcasting a revoked commitment transaction while the channel at closure time included multiple non-dust forwarded outbound HTLCs with identical payment hashes and amounts, failure to fail the HTLCs backwards could cause the channels on which we recieved the corresponding inbound HTLCs to be force-closed. Note that we'll receive, at a minimum, the malicious counterparty's reserve value when they broadcast the stale commitment (lightningdevkit#3556). Thanks to Matt Morehouse for reporting this issue.
Prior to this patch, if we attempted to send a payment or probe to a buggy route, we would error but continue storing the pending outbound payment forever. Attempts to retry would result in a “duplicate payment” error.
In the case of ChannelManager::send_payment, we would also fail to generate a PaymentFailed event, even if the user manually called abandon_payment.
This bug is unlikely to have ever been hit in the wild as most users use LDK’s router. Discovered in the course of adding a new send_to_route API.
Now, we’ll properly generate events and remove the outbound from storage.