-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Payment Retries #1059
Payment Retries #1059
Conversation
ec4d0c2
to
910814f
Compare
Codecov Report
@@ Coverage Diff @@
## main #1059 +/- ##
==========================================
+ Coverage 90.47% 90.66% +0.18%
==========================================
Files 68 67 -1
Lines 35273 35261 -12
==========================================
+ Hits 31914 31969 +55
+ Misses 3359 3292 -67
Continue to review full report at Codecov.
|
lightning-invoice/src/payment.rs
Outdated
pub fn new<L: Deref, E: EventHandler>( | ||
payer: P, router: R, network_graph: G, logger: L, event_handler: E | ||
) -> (Arc<Self>, PaymentRetryHandler<P, R, G, L, E>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that it would be better to separate the construction of InvoicePayer
and PaymentRetryHandler
, as it wouldn't require using an Arc
while still allowing it.
910814f
to
ae0cadd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, simple, clean API. I like it.
lightning-invoice/src/payment.rs
Outdated
} | ||
|
||
/// An [`EventHandler`] decorator for retrying failed payments. | ||
pub struct PaymentRetryHandler<I, P: Deref, R, G, L: Deref, E> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I fully understand why this is separate from the Payer
- can't we/shouldn't we just implement EventHandler
on the Payer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate from InvoicePayer
, right? Payer
is the trait parameter.
As of now, the ownership model is such that any decorating EventHandler
(or BackgroundProcessor
for that matter) would take ownership of this, which means InvoicePayer
could then only be used as an event handler and no longer for initiating payments. For instance, we'll need to decorate this with NetworkUpdateHandler
so that updates are applied before retries.
I suppose we could implement EventHandler
for T: Deref<Target = InvoicePayer<...>>
and users could use an Arc<InvoicePayer<...>>
(haven't actually tried this). But it seems cleaner to separate paying from the retry behavior, which also lends to intuitive naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate from InvoicePayer, right? Payer is the trait parameter.
Err, yes.
I suppose we could implement EventHandler for T: Deref<Target = InvoicePayer<...>> and users could use an Arc<InvoicePayer<...>> (haven't actually tried this)
Yes, along this line is what I was suggesting.
But it seems cleaner to separate paying from the retry behavior, which also lends to intuitive naming.
Hmm, I guess I don't see it, like, I have a thing that handles payment, I don't care about how it handles payment, or the fact that there's retries going on, I just want it to handle payment. Sure, I may need to wire it up, but having multiple things to wire up to get a payment to fire just seems like extra wires when its not adding flexibility for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it seems cleaner to separate paying from the retry behavior, which also lends to intuitive naming.
Hmm, I guess I don't see it, like, I have a thing that handles payment, I don't care about how it handles payment, or the fact that there's retries going on, I just want it to handle payment. Sure, I may need to wire it up, but having multiple things to wire up to get a payment to fire just seems like extra wires when its not adding flexibility for the user.
I see your point, though my remaining qualm is that you must pass it an EventHandler
that way, whereas you have the option of not doing so otherwise. Further, you do need to know that it is handling retries since that dictates how you compose different event handlers (e.g., decorating it with NetworkUpdateHandler
and not the other way around). So maybe it's better to be explicit about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, though my remaining qualm is that you must pass it an EventHandler that way, whereas you have the option of not doing so otherwise.
I mean my usual reply applies - do users ever have a reasonable use case where they'd want to ignore most events?
Further, you do need to know that it is handling retries since that dictates how you compose different event handlers (e.g., decorating it with NetworkUpdateHandler and not the other way around). So maybe it's better to be explicit about it?
I'm not entirely convinced that it's clear in the current api that you have to hook these things up in a given order, whether it says retry or not. I wonder if we should make it more clear? Maybe have this object pass the update to the network graph itself?
} | ||
} | ||
/// Logs a byte slice in hex format. | ||
#[macro_export] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we actually #[doc(hidden)]
this and all the other macros in this file that really shoulnd't be used by downstream crates but only LDK crates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see log_internal
needing that. But which other ones? Presumably users may want to log within their trait implementations, possibly using log_bytes
, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess I was mostly figuring the logging stuff is a one-way street - the lightning crates log stuff, and users consume it and put it somewhere. I suppose there's no real reason to require that, though, so, ok, fair enough. Please do add it to the internal one, though.
lightning-invoice/src/payment.rs
Outdated
fn pay_cached_invoice(&self, payment_hash: &PaymentHash) -> Result<(), PaymentError> { | ||
let invoice_cache = self.invoice_cache.lock().unwrap(); | ||
match invoice_cache.get(payment_hash) { | ||
Some(invoice) => self.pay_invoice_internal(invoice), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we somehow check that the route used on a retry is different from the original? I could see users using this API and just hooking it up to a remote route-getter which doesn't care about our failed payments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping that for simplicity this could be route-agnostic. Any changes to the network or scoring could affect the route calculation, so we should let those modules (e.g., higher-level event handlers) take care of making adjustments for the next time a route is fetched, if necessary, even if it means the same route is tried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I guess it doesn't matter a ton, just feels like this may be a common thing for users to do, and we'd retry the same route four times and then fail 10 seconds later when we could have told the user that we failed right away. Maybe we should expose the retry count instead to make it explicit for users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I guess it doesn't matter a ton, just feels like this may be a common thing for users to do, and we'd retry the same route four times and then fail 10 seconds later when we could have told the user that we failed right away.
Well, three times... first attempt counts as one. :)
Curious about the remote route-getter use case. If it typically will return the same route, it seems retries would not be valuable, which is an argument for keeping InvoicePayer
and PaymentRetryHandle
separate.
Maybe we should expose the retry count instead to make it explicit for users?
On the other hand, that would be a good argument for merging InvoicePayer
and PaymentRetryHandle
given the number of attempts is stored on the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about the remote route-getter use case. If it typically will return the same route, it seems retries would not be valuable, which is an argument for keeping InvoicePayer and PaymentRetryHandle separate.
Hmm, true. I guess you wouldn't generally use the invoice utilities in this case? I wonder if we merge the two and make the NetworkGraph an optional part if we can then just do retries if we have one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... may work, though now we start to affect the Router
interface as it still needs a NetworkGraph
for the first attempt. We could make it an Option
in find_route
. Alternatively, we could remove it entirely, so DefaultRouter
would just hold its own NetworkGraph
reference.
If we go with the latter option, then we could parameterize InvoicePayer
with the number of attempts instead of using a constant, which would make the retry behavior explicit during its construction. This would of course require merging the two structs. How does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could remove it entirely, so DefaultRouter would just hold its own NetworkGraph reference.
If we go with the latter option, then we could parameterize InvoicePayer with the number of attempts instead of using a constant, which would make the retry behavior explicit during its construction. This would of course require merging the two structs. How does that sound?
I think I'm a fan of this, yea, it reduces extra wires to hook up, better captures the set of likely user use-cases and Does The Right Thing, or at least makes the right thing more obvious, in common cases.
3061394
to
97bd88c
Compare
52da654
to
aa6b426
Compare
Needs rebase now 🎉. |
4c64c33
to
9c09290
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question, but I like this API. Note that I don't think this can go in a release without #1053, so should either land after that, or at least in the same release.
lightning-invoice/src/payment.rs
Outdated
|
||
// Either the payment was rejected, exceeded the maximum attempts, or failed retry. | ||
self.remove_cached_invoice(payment_hash); | ||
attempts_by_payment_hash.remove(payment_hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it seems somewhat duplicative to get the entry above and then remove it here. You could just store the entry and remove the entry here, doing the and_modify.or_insert in the retry condition before the return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the number of attempts using and_modify
and or_insert
for the comparison, which consumes the entry. But if no retry is needed we, couldn't then remove the entry directly since it has been consumed.
Probably could just do and_modify
then match to determine whether or not there's a value, but that doesn't seem as clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not quite as clean, but it does feel incredibly wasteful to hash into the hashmap twice for the same element 5 lines apart :/
--- a/lightning-invoice/src/payment.rs
+++ b/lightning-invoice/src/payment.rs
@@ -202,11 +202,13 @@ where
let mut attempts_by_payment_hash = self.payment_attempts.lock().unwrap();
- let attempts = attempts_by_payment_hash
- .entry(*payment_hash)
- .and_modify(|attempts| *attempts += 1)
- .or_insert(1);
+ let attempts_entry = attempts_by_payment_hash
+ .entry(*payment_hash);
+ let attempts = if let hash_map::Entry::Occupied(e) = &attempts_entry { e.get() + 1 } else { 1 };
if !rejected_by_dest {
let max_payment_attempts = self.retry_attempts + 1;
- if *attempts < max_payment_attempts {
+ if attempts < max_payment_attempts {
if self.pay_cached_invoice(payment_hash).is_ok() {
log_trace!(self.logger, "Payment {} failed; retrying (attempts: {})", log_bytes!(payment_hash.0), attempts);
+ attempts_entry
+ .and_modify(|attempts| *attempts += 1)
+ .or_insert(1);
return;
@@ -224,3 +226,2 @@ where
self.remove_cached_invoice(payment_hash);
- attempts_by_payment_hash.remove(payment_hash);
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you still need to remove the entry later, but that does work. However, there's still an additional look-up when remove_cached_invoice
is called. Given no tests failed when the attempts entry was not removed -- and that when called independently remove_cached_invoice
won't remove the attempts entry -- it probably makes sense to combine these two hash maps to ensure consistency. Along with your improvement, that takes us to one look-up instead of three.
a308452
to
41d2b07
Compare
c6e3c06
to
4caedcf
Compare
Once #1084 is merged, I can rebase and fix the failing checks. |
Can be rebased now :) |
4caedcf
to
35ad581
Compare
Note we should consider the invoice expiry when/before retrying. |
c7d2086
to
14f7d04
Compare
lightning-invoice/src/payment.rs
Outdated
log_trace!(self.logger, "Payment {} failed; retrying (attempts: {})", log_bytes!(payment_hash.0), attempts); | ||
return; | ||
} else { | ||
log_trace!(self.logger, "Error retrying payment {}; not retrying (attempts: {})", log_bytes!(payment_hash.0), attempts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the log could be a bit confusing here, because instead of a payment failed error, you immediately get an error retrying payment error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let me know if you have a suggestion for rewording.
2aca5b1
to
e7c1481
Compare
Ok, this is now using the parameters from Otherwise, the main changes are:
|
lightning/src/util/ser.rs
Outdated
@@ -911,3 +912,16 @@ impl Readable for String { | |||
Ok(ret) | |||
} | |||
} | |||
|
|||
impl Writeable for Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, can we just write out a u64 instead? It feels a bit funky serializing a duration and losing a bunch of precision as a general thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed as per other discussion. Could also keep Duration
and serialized as u128
nanoseconds. That said, for our use case, BOLT 11 defines the expiry time in seconds, so no need for more precision.
lightning/src/routing/router.rs
Outdated
@@ -179,12 +181,16 @@ pub struct Payee { | |||
|
|||
/// Hints for routing to the payee, containing channels connecting the payee to public nodes. | |||
pub route_hints: Vec<RouteHint>, | |||
|
|||
/// Expiration of a payment to the payee, relative to a user-defined epoch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a "user-defined epoch"? How is this expected to work with InvoicePayer
- doesn't it always check against 1970? Can we just make this seconds-since-1970 and a u64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thought was to let the user set this relative to whatever they wanted because ultimately they would be setting it and interpreting it. So even if we say it's seconds since the UNIX epoch, there's really nothing enforcing that.
Anyhow, updated to u64
as discussed offline.
lightning-invoice/src/payment.rs
Outdated
// error occurred when attempting to retry. | ||
entry.remove(); | ||
} else { | ||
log_trace!(self.logger, "Unknown payment {}; cannot retry)", log_bytes!(payment_hash.0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to auto-insert here (and maybe a test for retry on reload, but I'm happy to add that as a followup for you).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to handle this and added a unit test.
e7c1481
to
c241093
Compare
/// The id returned by [`ChannelManager::send_payment`] and used with | ||
/// [`ChannelManager::retry_payment`]. | ||
/// | ||
/// [`ChannelManager::send_payment`]: crate::ln::channelmanager::ChannelManager::send_payment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I just noticed that documentation for payment_id
seems to be missing from ChannelMannager::send_payment
. It would be nice to (a) document that the id should be stored for a retry, and (b) that it's needed because technically PaymentHash
s can be non-unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving unresolved pending discussions around changes to send_payment
.
2u8.write(writer)?; | ||
write_tlv_fields!(writer, { | ||
(0, payment_preimage, required), | ||
(1, payment_hash, required), | ||
(3, payment_id, option), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be a little nicer public API-wise if payment_id
were a non-Option
, and set to 0 if serialized before 0.0.104 (or wtv). Similar for the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we leave it an Option
for now and then just do a hard-break on our serialization backwards compat in like 2-3 versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me! 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, maybe that is why we decided to key by payment_hash
for now? I guess we could just not retry if PaymentId
is None
, though.
} | ||
|
||
/// Pays the given [`Invoice`], caching it for later use in case a retry is needed. | ||
pub fn pay_invoice(&self, invoice: &Invoice) -> Result<PaymentId, PaymentError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think I'd feel better about this if we had a plan for this, it does seem like a lot of rewrites would have to happen to add spontaneous support (and feels odd to merge code that'll be definitely rewritten)? From Slack chats, it looks like the relevant user has implemented their own retries for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to properly interpret PaymentSendFailure errors but otherwise looking good, I think.
let payment_hash = PaymentHash(invoice.payment_hash().clone().into_inner()); | ||
let payment_secret = Some(invoice.payment_secret().clone()); | ||
let payment_id = self.payer.send_payment(&route, payment_hash, &payment_secret) | ||
.map_err(|e| PaymentError::Sending(e))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to figure out what's up - if we get a PaymentSendFailure::ParameterError
or PaymentSendFailure::PathParameterError
then we can return an error here, I think. AllFailedRetrySafe
we should retry the requisite numbers right here, PartialFailure
we should insert and retry the relevant paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. This may require a bit more work since we lose the PaymentId
if send_payment
returns an error. Ideally, we'd get back the proper RouteParameters
for retry.
} else if has_expired(retry.as_ref().unwrap()) { | ||
log_trace!(self.logger, "Invoice expired for payment {}; not retrying (attempts: {})", log_bytes!(payment_hash.0), attempts); | ||
} else if self.retry_payment(*payment_id.as_ref().unwrap(), retry.as_ref().unwrap()).is_err() { | ||
log_trace!(self.logger, "Error retrying payment {}; not retrying (attempts: {})", log_bytes!(payment_hash.0), attempts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not always imply failure to send at all - see PaymentSendFailure
and the notes at PaymentSendFailure::PartialFailure
about when retry is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these cases (both returned by send_payment
and retry_payment
), I take it we must retry here as there won't be a PaymentPathFailed
event. Though one idea would be to generate them. It would make the retry attempt logic cleaner at very least.
c241093
to
2864f61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An upcoming Router interface will be used for finding a Route both when initially sending a payment and also when retrying failed payment paths. Unify the three varieties of get_route so the interface can consist of a single method implemented by the new `find_route` method. Give get_route pub(crate) visibility so it can still be used in tests.
2864f61
to
2042241
Compare
When a payment fails, it's useful to retry the payment once the network graph and channel scores are updated. InvoicePayer is a utility for making payments which will retry any failed payment paths for a payment up to a configured number of total attempts. It is parameterized by a Payer and Router for ease of customization and testing. Implement EventHandler for InvoicePayer as a decorator that intercepts PaymentPathFailed events and retries that payment using the parameters from the event. It delegates to the decorated EventHandler after retries have been exhausted and for other events.
According to BOLT 11: - after the `timestamp` plus `expiry` has passed - SHOULD NOT attempt a payment Add a convenience method for checking if an Invoice has expired, and use it to short-circuit payment retries.
Implements Payer for ChannelManager and Rotuer for find_route, which can be used to parameterize InvoicePayer when needing payment retries.
2042241
to
010436d
Compare
Diff from previous ACKs is trivial test changes, taking:
|
When a payment fails, it's useful to retry the payment once the network graph and channel scores are updated.
InvoicePayer
is a utility for making payments which caches the paymentInvoice
so that it can be used when retrying the payment. It is parameterized by aPayer
andRouter
for ease of customization and testing.PaymentRetryHandler
is used in connection withInvoicePayer
. It is anEventHandler
decorator that interceptsPaymentFailed
events and retries that payment using the cachedInvoice
.