Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix outbound payments memory leak on buggy router #3531

Merged

Conversation

valentinewallace
Copy link
Contributor

Prior to this patch, if we attempted to send a payment or probe to a buggy route, we would error but continue storing the pending outbound payment forever. Attempts to retry would result in a “duplicate payment” error.

In the case of ChannelManager::send_payment, we would also fail to generate a PaymentFailed event, even if the user manually called abandon_payment.

This bug is unlikely to have ever been hit in the wild as most users use LDK’s router. Discovered in the course of adding a new send_to_route API.

Now, we’ll properly generate events and remove the outbound from storage.

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

but continue storing the pending outbound payment forever.

Hmm, I wonder if there is any way to detect these cases in retrospect? I.e., could we prune them in remove_stale_payments so that they are dropped after upgrade and not linger forever in the storage of users that happened to hit the bug?

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

We need to stop passing this Vec by value for the next commit so we can pass it
to a different method.
@valentinewallace valentinewallace force-pushed the 2025-01-fix-buggy-route-err branch from b2bbb9a to 35122f1 Compare January 15, 2025 21:48
@valentinewallace
Copy link
Contributor Author

valentinewallace commented Jan 15, 2025

Hmm, I wonder if there is any way to detect these cases in retrospect? I.e., could we prune them in remove_stale_payments so that they are dropped after upgrade and not linger forever in the storage of users that happened to hit the bug?

@tnull I did consider this, it didn't seem worth it at the time because it means introducing extra code for a weird/unlikely case 🤔 One option would be to remove long-expired Retryables based on their PaymentParameters, which is an optional field that was added in 0.0.114. Block height is another option. Let me know if you think this is worth it.

@tnull
Copy link
Contributor

tnull commented Jan 16, 2025

@tnull I did consider this, it didn't seem worth it at the time because it means introducing extra code for a weird/unlikely case 🤔 One option would be to remove long-expired Retryables based on their PaymentParameters, which is an optional field that was added in 0.0.114. Block height is another option. Let me know if you think this is worth it.

If we're positive this likely really never happened in production, it's probably not worth it I guess.

@TheBlueMatt
Copy link
Collaborator

It looks like a user could have tried to pay an invoice with the MPP feature but without a payment secret and we could have hit this leak? I'm not sure that means we need to run the expiry backfill, though...if we do it probably means we'll want to run it on deserialization against the actual pending HTLCs-in-flight list, rather than just looking at the outbound payments in isolation.

@valentinewallace
Copy link
Contributor Author

Definitely fine by me to fix it if y'all prefer, though I'm preeetty hesitant to add more complexity to ChannelManager deserialization. I see why doing it that way is more robust but I guess I'm not really following why doing it in remove_stale_payments isn't sufficient.

Prior to this patch, if we attempted to send a payment or probe to a buggy
route, we would error but continue storing the pending outbound payment
forever. Attempts to retry would result in a “duplicate payment” error.

In the case of ChannelManager::send_payment, we would also fail to generate a
PaymentFailed event, even if the user manually called abandon_payment.

This bug is unlikely to have ever been hit in the wild as most users use LDK’s
router. Discovered in the course of adding a new send_to_route API.

Now, we’ll properly generate events and remove the outbound from storage.
@valentinewallace valentinewallace force-pushed the 2025-01-fix-buggy-route-err branch from 35122f1 to b2269f4 Compare January 16, 2025 16:01
@valentinewallace
Copy link
Contributor Author

Squashed in a macro -> function and linting fix:

diff --git a/lightning/src/ln/outbound_payment.rs b/lightning/src/ln/outbound_payment.rs
index d7e6b1869..8517d599c 100644
--- a/lightning/src/ln/outbound_payment.rs
+++ b/lightning/src/ln/outbound_payment.rs
@@ -1458,19 +1458,6 @@ impl OutboundPayments {
                IH: Fn() -> InFlightHtlcs,
                SP: Fn(SendAlongPathArgs) -> Result<(), APIError>,
        {
-               macro_rules! remove_session_privs {
-                       () => {
-                               if let Some(payment) = self.pending_outbound_payments.lock().unwrap().get_mut(&payment_id) {
-                                       for (path, session_priv_bytes) in route.paths.iter().zip(onion_session_privs.into_iter()) {
-                                               let removed = payment.remove(&session_priv_bytes, Some(path));
-                                               debug_assert!(removed, "This can't happen as the payment has an entry for this path added by callers");
-                                       }
-                               } else {
-                                       debug_assert!(false, "This can't happen as the payment was added by callers");
-                               }
-                       }
-               }
-
                match err {
                        PaymentSendFailure::AllFailedResendSafe(errs) => {
                                Self::push_path_failed_evs_and_scids(payment_id, payment_hash, &mut route_params, route.paths, errs.into_iter().map(|e| Err(e)), logger, pending_events);
@@ -1490,13 +1477,13 @@ impl OutboundPayments {
                        },
                        PaymentSendFailure::PathParameterError(results) => {
                                log_error!(logger, "Failed to send to route due to parameter error in a single path. Your router is buggy");
-                               remove_session_privs!();
+                               self.remove_session_privs(payment_id, &route, onion_session_privs);
                                Self::push_path_failed_evs_and_scids(payment_id, payment_hash, &mut route_params, route.paths, results.into_iter(), logger, pending_events);
                                self.abandon_payment(payment_id, PaymentFailureReason::UnexpectedError, pending_events);
                        },
                        PaymentSendFailure::ParameterError(e) => {
                                log_error!(logger, "Failed to send to route due to parameter error: {:?}. Your router is buggy", e);
-                               remove_session_privs!();
+                               self.remove_session_privs(payment_id, &route, onion_session_privs);
                                self.abandon_payment(payment_id, PaymentFailureReason::UnexpectedError, pending_events);
                        },
                        PaymentSendFailure::DuplicatePayment => debug_assert!(false), // unreachable
@@ -1536,6 +1523,21 @@ impl OutboundPayments {
                }
        }

+       // If a payment fails after adding the pending payment but before any HTLCs are locked into
+       // channels, we need to clear the session_privs in order for abandoning the payment to succeed.
+       fn remove_session_privs(
+               &self, payment_id: PaymentId, route: &Route, onion_session_privs: Vec<[u8; 32]>
+       ) {
+               if let Some(payment) = self.pending_outbound_payments.lock().unwrap().get_mut(&payment_id) {
+                       for (path, session_priv_bytes) in route.paths.iter().zip(onion_session_privs.into_iter()) {
+                               let removed = payment.remove(&session_priv_bytes, Some(path));
+                               debug_assert!(removed, "This can't happen as the payment has an entry for this path added by callers");
+                       }
+               } else {
+                       debug_assert!(false, "This can't happen as the payment was added by callers");
+               }
+       }
+
        pub(super) fn send_probe<ES: Deref, NS: Deref, F>(
                &self, path: Path, probing_cookie_secret: [u8; 32], entropy_source: &ES, node_signer: &NS,
                best_block_height: u32, send_payment_along_path: F
@@ -1809,7 +1811,7 @@ impl OutboundPayments {
                let cur_height = best_block_height + 1;
                let mut results = Vec::new();
                debug_assert_eq!(route.paths.len(), onion_session_privs.len());
-               for (path, session_priv_bytes) in route.paths.iter().zip(onion_session_privs.into_iter()) {
+               for (path, session_priv_bytes) in route.paths.iter().zip(onion_session_privs.iter()) {
                        let mut path_res = send_payment_along_path(SendAlongPathArgs {
                                path: &path, payment_hash: &payment_hash, recipient_onion, total_value,
                                cur_height, payment_id, keysend_preimage: &keysend_preimage, invoice_request,

tnull
tnull previously approved these changes Jan 17, 2025
@valentinewallace valentinewallace force-pushed the 2025-01-fix-buggy-route-err branch from 9311fae to f1af495 Compare January 17, 2025 17:37
// While a MonitorUpdateInProgress is an Err(_), the payment is still
// considered "in flight" and we shouldn't remove it from the
// PendingOutboundPayment set.
Ok(_) | Err(APIError::MonitorUpdateInProgress) => None,
Copy link
Contributor Author

@valentinewallace valentinewallace Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like unfortunately all tests pass with this diff:

diff --git a/lightning/src/ln/outbound_payment.rs b/lightning/src/ln/outbound_payment.rs
index 68d4125a3..430fb166c 100644
--- a/lightning/src/ln/outbound_payment.rs
+++ b/lightning/src/ln/outbound_payment.rs
@@ -1454,7 +1454,8 @@ impl OutboundPayments {
                                                        // While a MonitorUpdateInProgress is an Err(_), the payment is still
                                                        // considered "in flight" and we shouldn't remove it from the
                                                        // PendingOutboundPayment set.
-                                                       Ok(_) | Err(APIError::MonitorUpdateInProgress) => None,
+                                                       Ok(_) => None,
                                                        _ => Some((path, session_priv))
                                                }
                                        });

Even in the follow-up #3534 when rebased on this PR. Seems pre-existing though.

@valentinewallace valentinewallace added weekly goal Someone wants to land this this week and removed weekly goal Someone wants to land this this week labels Jan 21, 2025
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@valentinewallace valentinewallace force-pushed the 2025-01-fix-buggy-route-err branch from f1af495 to 3be8200 Compare January 23, 2025 22:08
@valentinewallace
Copy link
Contributor Author

Added a debug_assert and renamed a variable:

diff --git a/lightning/src/ln/outbound_payment.rs b/lightning/src/ln/outbound_payment.rs
index 98efd7b5d..8de47c4ae 100644
--- a/lightning/src/ln/outbound_payment.rs
+++ b/lightning/src/ln/outbound_payment.rs
@@ -1465,7 +1465,8 @@ impl OutboundPayments {
                                self.find_route_and_send_payment(payment_hash, payment_id, route_params, router, first_hops, inflight_htlcs, entropy_source, node_signer, best_block_height, logger, pending_events, send_payment_along_path);
                        },
                        PaymentSendFailure::PartialFailure { failed_paths_retry: Some(mut retry), results, .. } => {
-                               let filtered = results.iter().zip(route.paths.iter().zip(onion_session_privs.iter()))
+                               debug_assert_eq!(route.paths.len(), onion_session_privs.len());
+                               let failed_paths = results.iter().zip(route.paths.iter().zip(onion_session_privs.iter()))
                                        .filter_map(|(path_res, (path, session_priv))| {
                                                match path_res {
                                                        // While a MonitorUpdateInProgress is an Err(_), the payment is still
@@ -1475,7 +1476,7 @@ impl OutboundPayments {
                                                        _ => Some((path, session_priv))
                                                }
                                        });
-                               self.remove_session_privs(payment_id, filtered);
+                               self.remove_session_privs(payment_id, failed_paths);
                                Self::push_path_failed_evs_and_scids(payment_id, payment_hash, &mut retry, route.paths, results.into_iter(), logger, pending_events);
                                // Some paths were sent, even if we failed to send the full MPP value our recipient may
                                // misbehave and claim the funds, at which point we have to consider the payment sent, so

@valentinewallace
Copy link
Contributor Author

valentinewallace commented Jan 23, 2025

Ah wait, need one more debug_assert.

Edit: done

When an outbound payment fails while paying to a route, we need to remove the
session_privs for each failed path in the outbound payment.

Previously we were sometimes removing in pay_route_internal and sometimes in
handle_pay_route_err, so refactor this so we always remove in
handle_pay_route_err.
@valentinewallace valentinewallace force-pushed the 2025-01-fix-buggy-route-err branch from 3be8200 to e479317 Compare January 23, 2025 22:20
@tnull tnull merged commit 8307cc6 into lightningdevkit:main Jan 24, 2025
24 of 25 checks passed
@TheBlueMatt TheBlueMatt mentioned this pull request Jan 28, 2025
@TheBlueMatt
Copy link
Collaborator

Backported in #3567

TheBlueMatt added a commit to TheBlueMatt/rust-lightning that referenced this pull request Jan 29, 2025
v0.1.1 - Jan 28, 2025 - "Onchain Matters"

API Updates
===========

 * A `ChannelManager::send_payment_with_route` was (re-)added, with semantics
   similar to `ChannelManager::send_payment` (rather than like the pre-0.1
   `send_payent_with_route`, lightningdevkit#3534).
 * `RawBolt11Invoice::{to,from}_raw` were added (lightningdevkit#3549).

Bug Fixes
=========

 * HTLCs which were forwarded where the inbound edge times out within the next
   three blocks will have the inbound HTLC failed backwards irrespective of the
   status of the outbound HTLC. This avoids the peer force-closing the channel
   (and claiming the inbound edge HTLC on-chain) even if we have not yet managed
   to claim the outbound edge on chain (lightningdevkit#3556).
 * On restart, replay of `Event::SpendableOutput`s could have caused
   `OutputSweeper` to generate double-spending transactions, making it unable to
   claim any delayed claims. This was resolved by retaining old claims for more
   than four weeks after they are claimed on-chain to detect replays (lightningdevkit#3559).
 * Fixed the additional feerate we will pay each time we RBF on-chain claims to
   match the Bitcoin Core policy (1 sat/vB) instead of 16 sats/vB (lightningdevkit#3457).
 * Fixed a cased where a custom `Router` which returns an invalid `Route`,
   provided to `ChannelManager`, can result in an outbound payment remaining
   pending forever despite no HTLCs being pending (lightningdevkit#3531).

Security
========

0.1.1 fixes a denial-of-service vulnerability allowing channel counterparties to
cause force-closure of unrelated channels.
 * If a malicious channel counterparty force-closes a channel, broadcasting a
   revoked commitment transaction while the channel at closure time included
   multiple non-dust forwarded outbound HTLCs with identical payment hashes and
   amounts, failure to fail the HTLCs backwards could cause the channels on
   which we recieved the corresponding inbound HTLCs to be force-closed. Note
   that we'll receive, at a minimum, the malicious counterparty's reserve value
   when they broadcast the stale commitment (lightningdevkit#3556). Thanks to Matt Morehouse for
   reporting this issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants