add timeout retry strategy to outbound payment #1418

kafaichoi · 2022-04-12T09:24:35Z

This PR allow us to set timeout(time-based) as an alternative to existing count-based approach in outbound payment retry.

I also have some naming suggestion. I can make another PR(to avoid too much changes in this PR) to fix this if you agree that's improvement

InvoicePayment.payment_cache was actually used to store "retry attempts"(so it can be zero) but now it's used to store payment attempt. Should we rename it to payment_attempts_cache

codecov-commenter · 2022-04-12T09:52:15Z

Codecov Report

Merging #1418 (8d99732) into main (d166044) will increase coverage by 0.02%.
The diff coverage is 88.96%.

❗ Current head 8d99732 differs from pull request most recent head ca4099f. Consider uploading reports for the commit ca4099f to get more accurate results

@@            Coverage Diff             @@
##             main    #1418      +/-   ##
==========================================
+ Coverage   90.88%   90.90%   +0.02%     
==========================================
  Files          76       77       +1     
  Lines       42067    42116      +49     
  Branches    42067    42116      +49     
==========================================
+ Hits        38232    38286      +54     
+ Misses       3835     3830       -5

Impacted Files	Coverage Δ
lightning-invoice/src/lib.rs	`87.39% <ø> (ø)`
lightning/src/routing/scoring.rs	`96.19% <ø> (+1.53%)`	⬆️
lightning/src/util/time.rs	`81.25% <81.25%> (ø)`
lightning-invoice/src/payment.rs	`92.90% <94.38%> (+0.14%)`	⬆️
lightning-background-processor/src/lib.rs	`94.37% <100.00%> (ø)`
lightning-invoice/src/utils.rs	`96.77% <0.00%> (+0.14%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d166044...ca4099f. Read the comment docs.

jkczyz

Thanks for the contribution! Generally speaking, I'm in favor of renaming wherever it makes sense. Left some comments and will put a little more thought into naming as the review proceeds.

jkczyz · 2022-04-12T17:48:43Z

lightning-invoice/src/payment.rs

+    /// This should be >= 1 as we only insert PaymentAttempts for a PaymentHash if there is at least
+    /// one attempt.
+    counts: usize,
+    latest_attempted_at: SystemTime,


Is this needed? Looks like it is set but never read.

Good catch. It's not used anywhere now. But it will be useful for improving logging? Should I remove it or use it for improving logging msg(shown both count and total time spent)?

https://github.com/lightningdevkit/rust-lightning/pull/1418/files#diff-5165a75ba263a549c7fb9f4b555e449e44b0b680a80a9715a99d11157802a0d7R452

I believe we can simply store SystemTime::now() plus the duration from MaxDuration (i.e., when timeout occurs) and do any necessary math when logging.

IMO, I would remove it if it's not being used.

jkczyz · 2022-04-12T18:01:10Z

lightning-invoice/src/payment.rs

+      .or_insert(PaymentAttempts::new());
+
+    if ! self.retry_attempts.is_retryable_now(&PaymentAttempts{
+        counts: attempts.counts -1,


Not quite sure I follow why the count is incremented above and then decremented here.

Because the original logic in the code is always add one count(no matter if it's retryable here). But since now the retryable logic is encapsulated into a function that takes a PaymentAttempts and if we simply passed the updated one, then that will be incorrect(as we want to test if it can be tested before adding 1 count).

Couldn't this be accounted for in is_retryable_now? It's adding 1 back as it is.

So the logic in retry_payment is

always add 1 to counts in PaymentAttempts

check if it's retryable

the rest of logic

Retry::is_retryable_now take PaymentAttempts struct and use the counts field to determine if it's retryable if retry strategy is count-based. So in step2, if we don't decrement it by 1, it will be incorrect.

Retry::Attempts(max_retry_count) => max_retry_count + 1 > attempts.counts, the + 1 should be correct. For example,
if max_retry_count is 1 and attempts.count is 1, this means current retry count is 0(the first attempt is not retry attempt) so the + 1 here is required to make this function return true in this case.

jkczyz · 2022-04-12T18:02:14Z

lightning-invoice/src/payment.rs

+    counts: usize,
+    latest_attempted_at: SystemTime,
+    first_attempted_at: SystemTime


Do we need to store both count and time if only one is used?

It's not necessary but the advantage is we can have more information in error log if retry fail. related comment #1418 (comment)

And if really want to optimize this, I guess it makes sense to merge payment_cache and retry_attempts fields in InvoicePayer into one enum field to avoid impossible state(where retry strategy is count base but we store time vice versa).

Hmm... not sure how we'd do that because you need to know the maximum attempts when inserting into the map. Let me know if I misunderstood or if you have a more concrete idea.

We could also use a type parameter implementing a common trait (in a similar manner to the Time trait) such that there is never an invalid state.

I tried but unable to do the optimization of storing only either count and time(based on Retry). and storing both has a pro for displaying user more info in retry error. So I will put this optimization on pause but please let me know if this is high-priority I can give it another try.

jkczyz · 2022-04-12T18:13:37Z

lightning-invoice/src/payment.rs

+    /// one attempt.
+    counts: usize,
+    latest_attempted_at: SystemTime,
+    first_attempted_at: SystemTime


Note that SystemTime is only available with std feature, so this will need to be conditional. See places in this file where #[cfg(feature = "std")] is used.

With that in mind, RetryAttempts::MaxDuration won't be applicable without the feature, too.

jkczyz · 2022-04-12T18:19:06Z

lightning-invoice/src/payment.rs

-/// Note that this is the number of *path* failures, not full payment retries. For multi-path
-/// payments, if this is less than the total number of paths, we will never even retry all of the
-/// payment's paths.
+/// Retry strategy of retrying payment path failures for an [`Invoice`].


nit: s/Retry strategy of retrying/Strategy for retrying

jkczyz · 2022-04-12T18:20:31Z

lightning-invoice/src/payment.rs

+pub enum RetryAttempts {
+    /// Count Out. Max number of attempts to retry payment.
+    ///
+    /// Note that this is the number of *path* failures, not full payment retries. For multi-path
+    /// payments, if this is less than the total number of paths, we will never even retry all of the
+    /// payment's paths.
+    MaxCount(usize),
+	  /// Time Out. Max time spent to retry payment
+    MaxDuration(Duration),
+}


How about:

enum Retry { Attempts(usize), Timeout(Duration), }

much more concise! Thanks

jkczyz · 2022-04-12T18:43:40Z

lightning-invoice/src/payment.rs

+		assert_eq!(*event_handled.borrow(), false);
+		assert_eq!(*payer.attempts.borrow(), 2);
+
+    std::thread::sleep(Duration::from_millis(101));


Sleeps in testing tend to be a bit flaky and slow down test execution. Ideally, we can inject the time into the code such that in testing it can be manually advanced. We do that in the scorer code:

rust-lightning/lightning/src/routing/scoring.rs

Line 1164 in a3832b0

struct SinceEpoch(Duration);

We wouldn't need to expose the Time trait in InvoicePayer's public interface since it is an implementation detail and can be conditioned for tests to use SinceEpoch instead of SystemTime. It would be great if we can refactor the time code into a util::time module such that it could be reused here. We could symlink it into the lightning-invoice crate to re-use across crates.

jkczyz · 2022-04-13T16:59:40Z

lightning-invoice/src/payment.rs

+    /// This should be >= 1 as we only insert PaymentAttempts for a PaymentHash if there is at least
+    /// one attempt.
+    counts: usize,
+    latest_attempted_at: SystemTime,


I believe we can simply store SystemTime::now() plus the duration from MaxDuration (i.e., when timeout occurs) and do any necessary math when logging.

jkczyz · 2022-04-13T17:03:04Z

lightning-invoice/src/payment.rs

+    counts: usize,
+    latest_attempted_at: SystemTime,
+    first_attempted_at: SystemTime


Hmm... not sure how we'd do that because you need to know the maximum attempts when inserting into the map. Let me know if I misunderstood or if you have a more concrete idea.

We could also use a type parameter implementing a common trait (in a similar manner to the Time trait) such that there is never an invalid state.

jkczyz · 2022-04-13T17:04:00Z

lightning-invoice/src/payment.rs

+      .or_insert(PaymentAttempts::new());
+
+    if ! self.retry_attempts.is_retryable_now(&PaymentAttempts{
+        counts: attempts.counts -1,


Couldn't this be accounted for in is_retryable_now? It's adding 1 back as it is.

jkczyz · 2022-04-13T17:04:38Z

lightning-invoice/src/payment.rs

-					} else {
-						*retry_count += 1;
+					let payment_attempts = payment_cache.get_mut(&payment_hash).unwrap();
+          if self.retry_attempts.is_retryable_now(payment_attempts) {


Please indent with tabs instead of spaces here and throughout.

kafaichoi · 2022-04-14T03:32:30Z

Thank you for the thorough review. @jkczyz I have changed the PR status to draft and will change it back once it's ready to review again later.

maxgiraldo · 2022-04-14T11:37:34Z

lightning-invoice/src/payment.rs

+	fn is_retryable_now(&self, attempts: &PaymentAttempts) -> bool {
+		match self {
+			Retry::Attempts(0) => false,
+			Retry::Attempts(max_retry_count) => max_retry_count + 1 > attempts.counts,
+			Retry::Timeout(max_duration) => *max_duration >= SystemTime::now().duration_since(attempts.first_attempted_at).unwrap()
+		}
+	}


This is more of a curiosity, but do we only ever care about the current time, or could this be used to ask if something is retry-able at other times?

Suggested change

fn is_retryable_now(&self, attempts: &PaymentAttempts) -> bool {

match self {

Retry::Attempts(0) => false,

Retry::Attempts(max_retry_count) => max_retry_count + 1 > attempts.counts,

Retry::Timeout(max_duration) => *max_duration >= SystemTime::now().duration_since(attempts.first_attempted_at).unwrap()

}

}

fn is_retryable_at(&self, attempts: &PaymentAttempts, at: SystemTime) -> bool {

match self {

Retry::Attempts(0) => false,

Retry::Attempts(max_retry_count) => max_retry_count + 1 > attempts.counts,

Retry::Timeout(max_duration) => *max_duration >= at.duration_since(attempts.first_attempted_at).unwrap()

}

}

yea that was my initial reason adding the latest_attempted_at but since this is not needed anywhere yet so I removed it according to other feedback.

TheBlueMatt · 2022-04-16T18:10:57Z

lightning/src/util/time.rs

+
+/// A state in which time has no meaning.
+#[derive(Clone, Copy, Debug, PartialEq, Eq)]
+pub struct Eternity;


I'd really rather not make this pub. One (kinda dirty) hack to avoid that while re-using the module in both crates is to symlink the file file in both crates.

How about moving it lightning::util::test_utils?(see last commit 6b241ed)

Hmm, I'd kinda still prefer we keep the trait sealed (which I think requires the symlink hack, but maybe there's a way to use a foreign sealed trait?).

I'm not sure that we have an immediate use-case for users implementing their own Time, and even if we wanted users to be able to cancel payments on no-std we'd probably want to do it via some kind of "abandon payment now" API, not with reentrancy calling user code on a Time trait. Additionally, we can't capture traits with associated types in bindings, so we'd have to make it non-exported there anyway.

right. There is only one utils.rs instead of utils/mod.rs under lightning-invoice, So to avoid too much code change. I simply symlink it as time_utils.rs. Please let me know if it's reasonable?

TheBlueMatt · 2022-04-16T18:11:33Z

lightning/src/util/time.rs

+	/// Returns an instance corresponding to the current moment.
+	fn now() -> Self;
+


nit: EOL whitespace here (and a number of similar places in this file).

Sorry. I will fix this after other issues resolved(I need to find a way to config my editor to avoid these)

TheBlueMatt · 2022-04-17T16:56:03Z

lightning-invoice/src/payment.rs

+	counts: usize,
+	#[cfg(feature = "std")]
+	/// This field is only used when retry is [`Retry::Timeout`] which is only build with feature std
+	first_attempted_at: T


Hmm, I believe this leaves an unused parameter T in no-std builds. We either need to always include the field and let the ConfiguredTime "turn it off" in no-std or we need have two completely separate types for no-std vs std (which is gonna be a lot of code duplication).

Then it's probably better to just always include this field(remove the configuration std condition check)?
Btw how can I replicate the error/warning of unused parameter T in no-std builds in local?(everything look fine when I do cargo build for no-std)

TheBlueMatt

Nice! I'm a fan of the high-level structure now, I think.

TheBlueMatt · 2022-04-18T14:48:53Z

lightning-invoice/src/payment.rs

+pub type InvoicePayer<P, R, S, L, E> = InvoicePayerUsingTime::<P, R, S, L, E, ConfiguredTime>;
+
+#[cfg(not(feature = "no-std"))]
+type ConfiguredTime = std::time::SystemTime;


This should use Instant not SystemTime - we need a counter not the current wall clock time.

TheBlueMatt · 2022-04-18T14:52:33Z

lightning-invoice/src/payment.rs

+impl Retry {
+	fn is_retryable_now<T: Time>(&self, attempts: &PaymentAttempts<T>) -> bool {
+		match self {
+			Retry::Attempts(0) => false,


Why is this a separate branch? If max_retry_count is 0, we'll always return false when counts is 2, which it will be when we first check for a retry.

I removed this branch after switching back to store retry_count instead of total attempt count

TheBlueMatt · 2022-04-18T14:52:36Z

lightning-invoice/src/payment.rs

+	fn is_retryable_now<T: Time>(&self, attempts: &PaymentAttempts<T>) -> bool {
+		match self {
+			Retry::Attempts(0) => false,
+			Retry::Attempts(max_retry_count) => max_retry_count + 1 > attempts.counts,


Maybe just make this >=?

TheBlueMatt · 2022-04-18T14:57:49Z

lightning-invoice/src/payment.rs

+#[derive(Debug, Clone, Copy)]
+struct PaymentAttempts<T: Time> {
+	/// This should be >= 1 as we only insert PaymentAttempts for a PaymentHash if there is at least
+	/// /// one attempt.


You have a double /// here.

TheBlueMatt · 2022-04-18T15:03:31Z

lightning-invoice/src/payment.rs

+impl <T: Time>PaymentAttempts<T> {
+	fn new() -> Self {
+		PaymentAttempts{
+			counts: 1,


I'm not sure why we insert at 1 here, that's a behavior change from the original code, which always inserted at 0.

Yeah, also see: #1418 (comment)

I'm not sure I follow why the count logic needs to change if you are just adding another strategy.

My understanding of previous code is the count we store is the number of retry.

For example, in pay_pubkey, it's set as 0 when there is no entry found in payment_cache.
https://github.com/lightningdevkit/rust-lightning/pull/1418/files#diff-5165a75ba263a549c7fb9f4b555e449e44b0b680a80a9715a99d11157802a0d7L330

However, in this PR, I changed it to store number of attempt(which should be number of retry + 1). And we will only want to create and store PaymentAttempts when there is at least one attempt. That's the reason why count is set as 1 in PaymentAttempts::new.

I have actually tried to change storing total attempt to storing retry count(like current behaviour). 145bede

It does look cleaner :). Please let me know if you still find it weird that I have to do - 1 here b08f0ea @jkczyz

The biggest reason is I think Retry::is_retryable_now shouldn't be aware that PaymentAttempts passed has just been added 1 retry count. It should simply take PaymentAttempts and determine if it's retryable right now based on the current field. (My apology if my explanation is still confusing)

Nice! Looks much better modulo Matt's comment about incrementing after calling is_retryable_now.

TheBlueMatt · 2022-04-18T15:04:13Z

lightning-invoice/src/payment.rs

@@ -217,7 +268,28 @@ pub trait Router<S: Score> {
 /// payments, if this is less than the total number of paths, we will never even retry all of the
 /// payment's paths.
 #[derive(Clone, Copy, Debug, Eq, Hash, PartialEq)]
-pub struct RetryAttempts(pub usize);
+pub enum Retry {


The enum-level docs probably need updating to no longer only talk about counts.

TheBlueMatt · 2022-04-18T15:04:42Z

lightning-invoice/src/payment.rs

+					let payment_attempts = payment_cache.get_mut(&payment_hash).unwrap();
+					if self.retry.is_retryable_now(payment_attempts) {
+						payment_attempts.counts += 1;
+						Ok(self.pay_internal(params, payment_hash, send_payment)?)


Don't we need to drop the lock in the retry branch? Looks like you swapped the branch order but left the lock-drop in the other branch now.

Sorry totally my mistake

jkczyz · 2022-04-18T16:58:12Z

lightning/src/routing/scoring.rs

+use util::time::Eternity;
+#[cfg(feature = "no-std")]
+type ConfiguredTime = Eternity;



Could you make this a single statement so we don't import Eternity when not configured?

emm in line 255. there is also #[cfg(feature = "no-std")] so we will only import Eternity and ConfiguredTime set as Eternity together when no-std feature is set

I am not sure how to do the import and type declaration in one single statement. and it looks like there is cfg-if crate for providing a helpful macro to avoid duplicate cfg here? Is that what u meant?

Ah, never mind. Forgot that you still would need to import the util module even if you fully-qualified as util::time::Eternity on line 258.

jkczyz · 2022-04-18T16:59:18Z

lightning/src/routing/scoring.rs

 	#[test]
 	fn time_passes_when_advanced() {
 		let now = SinceEpoch::now();


You can remove these two tests since they were moved.

jkczyz · 2022-04-18T18:36:11Z

lightning/src/util/time.rs

+//! A very simple serialization framework which is used to serialize/deserialize messages as well
+//! as ChannelsManagers and ChannelMonitors.


Need to update the docs.

jkczyz · 2022-04-18T18:38:22Z

lightning-invoice/src/payment.rs

+//! let invoice_payer = InvoicePayer::new(&payer, router, &scorer, &logger, event_handler,
+//! Retry::Attempts(2));


No need to wrap the doc example. Or at very least would need to indent accordingly.

jkczyz · 2022-04-18T18:38:58Z

lightning-invoice/src/payment.rs

+use core::fmt;
+use core::fmt::{Display, Formatter};


Could you move these before core::ops::Deref?

jkczyz · 2022-04-18T19:00:15Z

lightning-invoice/src/payment.rs

+	/// payment's paths.
+	Attempts(usize),
+	#[cfg(feature = "std")]
+	/// Time Out. Max time spent to retry payment


The phrasing is a little awkward here. How about "Time elapsed before abandoning retries for a payment."?

yours is much better. Thanks!

jkczyz · 2022-04-18T19:09:14Z

lightning-invoice/src/payment.rs

+impl <T: Time>PaymentAttempts<T> {
+	fn new() -> Self {
+		PaymentAttempts{
+			counts: 1,


Yeah, also see: #1418 (comment)

I'm not sure I follow why the count logic needs to change if you are just adding another strategy.

jkczyz · 2022-04-18T19:12:18Z

lightning-invoice/src/payment.rs

+	first_attempted_at: T
+}
+
+impl <T: Time>PaymentAttempts<T> {


nit: no space after impl but add a space before PaymentAttempts .

jkczyz · 2022-04-18T19:12:41Z

lightning-invoice/src/payment.rs

+
+impl <T: Time>PaymentAttempts<T> {
+	fn new() -> Self {
+		PaymentAttempts{


nit: space after PaymentAttempts

jkczyz · 2022-04-18T19:15:32Z

lightning-invoice/src/payment.rs

+		type InvoicePayerUsingSinceEpoch <P, R, S, L, E> = InvoicePayerUsingTime::<P, R, S, L, E, SinceEpoch>;
+
+		let invoice_payer =
+			InvoicePayerUsingSinceEpoch::new(&payer, router, &scorer, &logger, event_handler, Retry::Timeout(Duration::from_millis(100)));


Might as well use something more realistic looking like seconds since no actual sleeps occur.

jkczyz · 2022-05-02T19:47:06Z

lightning/src/util/time.rs

-#[cfg(not(feature = "no-std"))]
-impl Time for std::time::SystemTime {
-	fn now() -> Self {
-		std::time::SystemTime::now()
-	}
-
-	fn duration_since(&self, earlier: Self) -> Duration {
-		self.duration_since(earlier).unwrap()
-	}
-
-	fn duration_since_epoch() -> Duration {
-		use std::time::SystemTime;
-		SystemTime::now().duration_since(SystemTime::UNIX_EPOCH).unwrap()
-	}
-
-	fn elapsed(&self) -> Duration {
-		std::time::SystemTime::elapsed(self).unwrap()
-	}
-}
-


Looks like this should have been removed from the first commit since the code was not needed.

jkczyz · 2022-05-02T19:47:39Z

lightning/src/util/time.rs

-	
+


Similarly, this should be in the first commit.

jkczyz · 2022-05-02T19:53:29Z

lightning/src/util/time.rs

+// You may not use this file except in accordance with one or both of these
+// licenses.
+
+//! Time Trait and different implementations. Currently, it's mainly used in tests so we can


Here and below:

s/Time Trait/[Time] trait

jkczyz · 2022-05-02T19:55:11Z

lightning/src/util/time.rs

+//! This file is symlinked to lightning_invoice/src/time_utils.rs so we can keep Time Trait sealed
+//! while allowing other crates(lightning_invoice) to use it.


The symlink is the other way around, no? Just say other crates may symlink this file instead of referencing which files, as otherwise the list may go out of date.

jkczyz · 2022-05-02T19:57:37Z

lightning/src/util/time.rs

+//! while allowing other crates(lightning_invoice) to use it.
+
+use core::ops::Sub;
+use core::time::Duration;


nit: add blank line between imports and Time docs

jkczyz · 2022-05-02T20:16:00Z

lightning-invoice/src/payment.rs

 		#[cfg(feature = "std")] {
 			if has_expired(params) {
-				log_trace!(self.logger, "Invoice expired for payment {}; not retrying (attempts: {})", log_bytes!(payment_hash.0), attempts);
+				log_trace!(self.logger, "Invoice expired for payment {}; not retrying ({:?})", log_bytes!(payment_hash.0), attempts);


Hmm... The attempts won't be accurate here since you've incremented earlier (and in Debug) but no retry occurs here. Similarly below.

Would it simplify things if we made PaymentAttempts::retry_count simply a count representing the number of attempts up until the retry? I believe that is what was done previously and as is still documented on the hash map:

/// Caches the overall attempts at making a payment, which is updated prior to retrying.

Essentially, the attempt is counted only once we get an event back on whether a path was successful or not. So renaming to count, unconditionally incrementing in this function, and not incrementing in Debug should give the correct number. Would need to modify is_retryable_now, too, IIUC.

Then you don't need to worry about the lock since you can increment in the above look-up, clone, and drop the lock implicitly in one statement similarly to the code prior to the change.

ah u are right that the log in this branch(has_expired true) is not accurate. But can we simply move this has_expire check before is_retryable_now? I ran test in local after this change and all test case passed as well.

As per your suggestion, I believed that was my initial approach(got forced-pushed) but I decided to change it to only store retry_count because this part( #1418 (comment)) is confusing to others.

If we want to avoid the confusing part in my previous forced-pushed commit, does it mean we need to change the logic in Retry::is_retryable_now to be aware of the fact count always get incremented by 1 before calling retry_payment. I found that it's actually more confusing as now Retry::is_retryable_now is not a context-free function especially now is_retryable_now is also reused inside PaymentSendFailure::AllFailedRetrySafe

ah u are right that the log in this branch(has_expired true) is not accurate. But can we simply move this has_expire check before is_retryable_now? I ran test in local after this change and all test case passed as well.

Either order should be fine, but arguably you wouldn't attempt to retry if the count had been exceeded already.

As per your suggestion, I believed that was my initial approach(got forced-pushed) but I decided to change it to only store retry_count because this part( #1418 (comment)) is confusing to others.

Ah, I see. Seems like the confusion could have been resolved by maintaining the original behavior, no?

If we want to avoid the confusing part in my previous forced-pushed commit, does it mean, does it mean we need to change the logic in Retry::is_retryable_now to be aware of the fact count always get incremented by 1 before calling retry_payment. I found that it's actually more confusing as now Retry::is_retryable_now is not a context-free function especially now is_retryable_now is also reused inside PaymentSendFailure::AllFailedRetrySafe

Hmm... so the invariant was that the attempt count was incremented only after the result of the attempt is known. For PaymentSendFailure::AllFailedRetrySafe, we receive this result immediately rather than needing to wait for an event, so we never hit the retry_payment path.

I can see your argument as to why a retry count may make more sense. However, such a change should really be a separate commit with the what and why clearly described in the commit message. Admittedly, I've been lax about asking for the rationale in commit messages, which is okay when the commit is obvious. But the atomicity of commits is important, and this one makes two separate behavioral changes. I'd prefer if they were separate so that they can be considered independently.

FWIW I believe the original behavior was inconsistent - AFAICT we'll retry automatically once even with the max-retries count set to 0 if the payment fails immediately when we try to send instead of when an HTLC fails. Not that that behavior is really broken, its just confusing, so its all a bit of a mess.

Hmm... I think it should work as expected. For AllFailedRetrySafe the number of attempts before increment (i.e., retry_count) and max retries are both 0, so an error is returned here:

rust-lightning/lightning-invoice/src/payment.rs

Lines 370 to 372 in 9fbafd4

let retry_count = payment_cache.get_mut(&payment_hash).unwrap();

if *retry_count >= self.retry_attempts.0 {

Err(e)

And for PartialFailure we hit the retry_payment case where the number of attempts is incremented and compared against maximum number of attempts (i.e. max retries plus one).

rust-lightning/lightning-invoice/src/payment.rs

Lines 402 to 411 in 9fbafd4

let max_payment_attempts = self.retry_attempts.0 + 1;

let attempts = *self.payment_cache.lock().unwrap()

.entry(payment_hash)

.and_modify(|attempts| *attempts += 1)

.or_insert(1);

if attempts >= max_payment_attempts {

log_trace!(self.logger, "Payment {} exceeded maximum attempts; not retrying (attempts: {})", log_bytes!(payment_hash.0), attempts);

return Err(());

}

Ah, maybe I'm wrong.....its more than a bit confusing, though...

@jkczyz Thank you so much for breaking it down. I will break the last commit into 2 as per your suggestion so it's easier for us to see if the change make sense or not(And I will also look into if the original approach result in some incorrect log msg) . However, I will probably only have time this weekend🙏🏽

@jkczyz I have made changes to restore the previous behavior and store attempts instead of retry attempts. After spending more time understanding the code, I actually think it's cleaner to keep in the original way :)

And I changed it back to impl Display for PaymentAttempts(somehow I can't find your related comment now)

TheBlueMatt · 2022-05-04T18:50:43Z

Oops looks like this needs a rebase, sorry about that.

TheBlueMatt · 2022-05-13T17:47:47Z

Ugh needs rebase again. Really sorry somehow I missed you'd pushed the commits above and didn't manage to review before it got stale again :(

…thin crate

kafaichoi · 2022-05-14T04:15:33Z

Ugh needs rebase again. Really sorry somehow I missed you'd pushed the commits above and didn't manage to review before it got stale again :(

no worry! I actually felt so bad this relative simple PR causing core developers lots of time reviewing on this 😭

TheBlueMatt · 2022-05-14T18:24:30Z

no worry! I actually felt so bad this relative simple PR causing core developers lots of time reviewing on this 😭

Nah! Final round review takes a bit of time, but passes before then take 5-10 minutes, don't sweat it :).

jkczyz · 2022-05-16T14:27:52Z

lightning-invoice/src/payment.rs

+impl Retry {
+	fn is_retryable_now<T: Time>(&self, attempts: &PaymentAttempts<T>) -> bool {
+		match (self, attempts) {
+			(Retry::Attempts(max_retry_count), PaymentAttempts { count, .. }) => max_retry_count >= &count,


nit: could you wrap at 100 characters?

sure. (wish cargo fmt is integrated here 😬)

jkczyz · 2022-05-16T14:46:55Z

lightning-invoice/src/payment.rs

-					} else {
-						*retry_count += 1;
+					let payment_attempts = payment_cache.get_mut(&payment_hash).unwrap();
+					if self.retry.is_retryable_now(payment_attempts) {


Doesn't this break for Retry::Attempts(0)? The cache will have a value of 0 which would cause is_retryable_now to return true. But in this case we wouldn't want to the retry since InvoicePayer is configured for no retries. Perhaps we need a test for this case?

I was gonna let this one slide, IIUC it's how upstream works now, and not super surprising that we'd not instantly fail and at least try to find a route that we can send over. Though really this should only happen in race conditions and pretty exceedingly rare, if everything is working right.

Dunno if it's worth trying to build a test for but can just move the increment up.

Sure, let's just move the increment up. Upstream, the check has the operands flipped with the opposite meaning (error), so works as intended.

ah good catch. I have moved the increment up.

jkczyz self-requested a review April 12, 2022 15:36

jkczyz reviewed Apr 12, 2022

View reviewed changes

jkczyz reviewed Apr 13, 2022

View reviewed changes

kafaichoi marked this pull request as draft April 14, 2022 02:59

kafaichoi force-pushed the timeout-outbound-paymnet-retires-instead-of-count-out branch from f47a834 to 7faa2e3 Compare April 14, 2022 03:28

kafaichoi force-pushed the timeout-outbound-paymnet-retires-instead-of-count-out branch 2 times, most recently from 4187576 to d23b0c2 Compare April 14, 2022 04:18

maxgiraldo reviewed Apr 14, 2022

View reviewed changes

TheBlueMatt reviewed Apr 16, 2022

View reviewed changes

kafaichoi force-pushed the timeout-outbound-paymnet-retires-instead-of-count-out branch 5 times, most recently from 99d3203 to 6b241ed Compare April 17, 2022 07:06

TheBlueMatt reviewed Apr 17, 2022

View reviewed changes

kafaichoi force-pushed the timeout-outbound-paymnet-retires-instead-of-count-out branch 2 times, most recently from 22f4f76 to 29a381c Compare April 18, 2022 08:03

kafaichoi marked this pull request as ready for review April 18, 2022 08:08

kafaichoi force-pushed the timeout-outbound-paymnet-retires-instead-of-count-out branch from 29a381c to 5204952 Compare April 18, 2022 10:12

TheBlueMatt reviewed Apr 18, 2022

View reviewed changes

jkczyz reviewed Apr 18, 2022

View reviewed changes

kafaichoi force-pushed the timeout-outbound-paymnet-retires-instead-of-count-out branch 6 times, most recently from a7384af to b08f0ea Compare April 23, 2022 06:38

vincenzopalazzo previously approved these changes May 2, 2022

View reviewed changes

TheBlueMatt previously approved these changes May 2, 2022

View reviewed changes

TheBlueMatt assigned jkczyz May 2, 2022

jkczyz reviewed May 2, 2022

View reviewed changes

kafaichoi dismissed stale reviews from TheBlueMatt and vincenzopalazzo via 48329c0 May 4, 2022 13:20

kafaichoi force-pushed the timeout-outbound-paymnet-retires-instead-of-count-out branch from c04b14b to 48329c0 Compare May 4, 2022 13:20

kafaichoi force-pushed the timeout-outbound-paymnet-retires-instead-of-count-out branch from 48329c0 to 9245565 Compare May 4, 2022 23:44

jkczyz mentioned this pull request May 10, 2022

lightning-invoice/utils: Actually add expiry to invoices #1474

Merged

move Time trait from scoring mod to util::time and set it visibile wi…

10f9795

…thin crate

kafaichoi force-pushed the timeout-outbound-paymnet-retires-instead-of-count-out branch from 9245565 to 14248ee Compare May 14, 2022 04:14

kafaichoi force-pushed the timeout-outbound-paymnet-retires-instead-of-count-out branch 6 times, most recently from ca4099f to 00ee895 Compare May 14, 2022 07:06

TheBlueMatt previously approved these changes May 14, 2022

View reviewed changes

jkczyz reviewed May 16, 2022

View reviewed changes

add timeout retry strategy to outbound payment

b7d6d0d

kafaichoi dismissed TheBlueMatt’s stale review via b7d6d0d May 17, 2022 06:27

kafaichoi force-pushed the timeout-outbound-paymnet-retires-instead-of-count-out branch from 00ee895 to b7d6d0d Compare May 17, 2022 06:27

jkczyz approved these changes May 17, 2022

View reviewed changes

TheBlueMatt approved these changes May 17, 2022

View reviewed changes

TheBlueMatt merged commit 998cdc0 into lightningdevkit:main May 17, 2022

TheBlueMatt mentioned this pull request May 17, 2022

Allow specifying retry strategy per-payment instead of globally #1487

Closed

		/// Returns an instance corresponding to the current moment.
		fn now() -> Self;

		//! A very simple serialization framework which is used to serialize/deserialize messages as well
		//! as ChannelsManagers and ChannelMonitors.

		//! let invoice_payer = InvoicePayer::new(&payer, router, &scorer, &logger, event_handler,
		//! Retry::Attempts(2));

		//! This file is symlinked to lightning_invoice/src/time_utils.rs so we can keep Time Trait sealed
		//! while allowing other crates(lightning_invoice) to use it.

	let retry_count = payment_cache.get_mut(&payment_hash).unwrap();
	if *retry_count >= self.retry_attempts.0 {
	Err(e)

	let max_payment_attempts = self.retry_attempts.0 + 1;
	let attempts = *self.payment_cache.lock().unwrap()
	.entry(payment_hash)
	.and_modify(\|attempts\| *attempts += 1)
	.or_insert(1);

	if attempts >= max_payment_attempts {
	log_trace!(self.logger, "Payment {} exceeded maximum attempts; not retrying (attempts: {})", log_bytes!(payment_hash.0), attempts);
	return Err(());
	}

add timeout retry strategy to outbound payment #1418

add timeout retry strategy to outbound payment #1418

Conversation

kafaichoi commented Apr 12, 2022 • edited Loading

codecov-commenter commented Apr 12, 2022 • edited Loading

Codecov Report

jkczyz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kafaichoi Apr 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kafaichoi Apr 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kafaichoi Apr 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kafaichoi commented Apr 14, 2022

maxgiraldo Apr 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kafaichoi Apr 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheBlueMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kafaichoi Apr 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kafaichoi Apr 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kafaichoi commented Apr 12, 2022 •

edited

Loading

codecov-commenter commented Apr 12, 2022 •

edited

Loading

kafaichoi Apr 13, 2022 •

edited

Loading

kafaichoi Apr 13, 2022 •

edited

Loading

kafaichoi Apr 18, 2022 •

edited

Loading

maxgiraldo Apr 14, 2022 •

edited

Loading

kafaichoi Apr 17, 2022 •

edited

Loading

kafaichoi Apr 23, 2022 •

edited

Loading

kafaichoi Apr 23, 2022 •

edited

Loading

kafaichoi May 4, 2022 •

edited

Loading