-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplicate PaymentSent events for MPP payments #1053
Deduplicate PaymentSent events for MPP payments #1053
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1053 +/- ##
==========================================
+ Coverage 91.00% 91.64% +0.64%
==========================================
Files 65 65
Lines 33695 36410 +2715
==========================================
+ Hits 30663 33367 +2704
- Misses 3032 3043 +11
Continue to review full report at Codecov.
|
778d2ae
to
efabf2d
Compare
c29ff9f
to
cbd2317
Compare
388a0e9
to
500f8e9
Compare
500f8e9
to
ae070e6
Compare
Should this get a new test for the new event? |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I was quite distracted when I looked at this yesterday, sorry about that :(.
In any case, it looks good, one note just a rebase change and otherwise minor gripes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took largely a high-level pass. Looking pretty good.
lightning/src/util/events.rs
Outdated
/// The hash which was given to ChannelManager::send_payment. | ||
payment_hash: PaymentHash, | ||
/// The path that failed. | ||
payment_path: Vec<RouteHop>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on a PR that adds the path to PaymentFailed
for scoring purpose. It will also include the index of the failed hop. Given that, do you think we need a separate event? Could we add a marker for multi-path payments if needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion. Matt pointed out in this comment that there could be value in users knowing whether to retry a portion of a payment vs. retrying the entire payment. @TheBlueMatt I tend to agree that this might make MPPFragmentFailed
a bit redundant, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could break out the relevant commits into their own PR if it helps. I think from a retry perspective, though, we'd need to know the part amount unless InvoicePayer
tracks that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this one offline, but we just need to make sure that we expose to users somehow the distinction between a payment failing and an MPP fragment failing, as they have significant implication for what the user does in response. We really don't want users to think that a PaymentFailed
event implies the payment failed, when it was only one mpp fragment that failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drafted #1077
ae070e6
to
7c92daa
Compare
lightning/src/ln/channelmanager.rs
Outdated
if let Some(mut sessions) = outbounds.remove(&mpp_id) { | ||
if sessions.remove(&session_priv_bytes) { | ||
self.pending_events.lock().unwrap().push( | ||
events::Event::PaymentSent { payment_preimage } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, an MPP will only result in one PaymentSent
event -- let me know if I misunderstood. With that in mind, I was thinking of adding the successful path to PaymentSent
for scoring. That way a channel's score could be improved if successfully routing through it. But if there is only one PaymentSent
event for MPP, then such improvements will only be possible for one path. Would it be possible to include all the paths for PaymentSent
event? Or would that require waiting for all paths to succeed before sending such an event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, an MPP will only result in one PaymentSent event
Yup! We generate it as soon as one path of the MPP succeeds (i.e., we do not wait for all paths to succeed). Reasoning: as soon as we have the preimage, the rest of the paths are irrevocably "solved."
Would it be possible to include all the paths for PaymentSent event?
Hmm so we could iiuc, but I don't think it makes sense to feed paths to the scorer until they actually succeed. Not sure the solution here, will have to think on it a bit. Ideally I guess we'd generate additional events after the first PaymentSent
, that contain the path that succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as long as the receipient is "honest" (or at least wants their money and isn't buggy), we know all the paths succeeded when we get the first preimage here, so I think it would be "correct", but code-wise its probably quite tricky - the actual HTLCSource
is stored away in the HTLC set of some other channel so not readily available here. Maybe we'll need a separate event here to indicate further paths that succeeded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... yeah, that may be a good argument for separating path succeeding into its own event. Alternatively, ChannelManager
could be parameterized by the scorer, which it would call as needed instead of using events. Though that could introduce the possibility of reentrancy. Maybe not the best idea and would be inconsistent with network updates.
Related to the other discussion, I guess for multi-path payments the user (or InvoicePayer
) ultimately decides if a payment has failed when it no longer wants to retry failed paths (and hasn't received the preimage). Likewise for single-path payments. Either the user has the preimage (success) or doesn't (possible failure) and needs to decided whether to give up (i.e., no longer retry or wait).
With that in mind, maybe having a failed or successful payment at the event level isn't the right abstraction? Instead, there's PreimageReceived
, SuccessfulPaymentPath
, and FailedPaymentPath
. Maybe the last two could be named in terms of HTLCs instead? Sorta thinking out loud on this, so let me know if I'm off on something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... yeah, that may be a good argument for separating path succeeding into its own event.
I think that makes sense. It's a bit awkward to do something different for success and failure but the semantics are pretty different and Val just removed the separate failure event from path event. For path success the probability increases can happen later, we're not trying to make sure we get them in before retrying a new path so I think it's fine.
Alternatively, ChannelManager could be parameterized by the scorer, which it would call as needed instead of using events. Though that could introduce the possibility of reentrancy. Maybe not the best idea and would be inconsistent with network updates.
Yea, I don't think the complexity of the reentrancy is worth it there.
With that in mind, maybe having a failed or successful payment at the event level isn't the right abstraction? Instead, there's PreimageReceived, SuccessfulPaymentPath, and FailedPaymentPath. Maybe the last two could be named in terms of HTLCs instead? Sorta thinking out loud on this, so let me know if I'm off on something.
I think that kinda makes sense. I just suggested renaming PaymentFailed to HTLC/Path/PartFailed. For preimage receipt it does mean you need to treat the payment as sent, though, so I'm not sure as really need to swap that one.
7c92daa
to
366a29c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, A few nits and minor things here and there but I'm basically happy.
lightning/src/util/events.rs
Outdated
read_tlv_fields!(reader, { | ||
(0, payment_hash, required), | ||
(1, network_update, ignorable), | ||
(2, rejected_by_dest, required), | ||
(3, all_paths_failed, required), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
required
implies that we'll fail to read old versions that are missing the field. I think we should just initialize it to Some(true)
and unwrap it below, keeping the broken behavior for events that were pending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I thought if all_paths_failed
was an option
fieldty
in read_tlv_fields
, it had to be an option
in write_tlv_fields
also. Updating mental model since they def can be asymmetric.
Sorry, I finally drew out the 2x2 tlv ser matrix (type
odd/even is for old clients reading new, fieldty
option/required is for new clients reading old)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, everything is written the same whether its an option or not, option just declares whether we require it to be present or not, and odd/even declares whether old versions that don't understand it will fail or ignore.
/// For both single-path and multi-path payments, this is set if all paths of the payment have | ||
/// failed. This will be set to false if (1) this is an MPP payment and (2) other parts of the | ||
/// larger MPP payment were still in flight when this event was generated. | ||
all_paths_failed: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really need to rename PaymentFailed
after this IMO - maybe PaymentPartFailed
or HTLCFailed
, because it definitely no longer means the whole payment failed. I'm ok with pushing this to a followup if you just wanna get this merged without blowing up the diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with renaming PaymentFailed
. I do like using HTLC in the name as mentioned in another comment. Presumably these won't be deduplicated?
However, I'm still uncertain about adding the all_paths_failed
field. I feel like I might be missing something, though. How would the retrier make use of it? If false, it needs to retry for the corresponding part unless it has received a PaymentSent
. If true, doesn't it need to do the same thing? If not, is there any overlap with rejected_by_dest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I'm still uncertain about adding the all_paths_failed field. I feel like I might be missing something, though. How would the retrier make use of it? If false, it needs to retry for the corresponding part unless it has received a PaymentSent. If true, doesn't it need to do the same thing?
I believe we'll need a different call in ChannelManager to retry a failed MPP path, as we won't be able to just blindly call send_payment again to keep state consistent. Thus, we'll need to decide which function we call based on that bool.
More generally, the bool isn't just for the retrier - its also to tell the user if a payment has failed, or if a path has failed. Until you either see all_paths_failed
or PaymentSent
, you cannot consider a payment resolved.
If not, is there any overlap with rejected_by_dest?
They're two very different things - whether a final node rejected the payment or not, we can't consider a payment failed unless we get all the paths failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we'll need a different call in ChannelManager to retry a failed MPP path, as we won't be able to just blindly call send_payment again to keep state consistent. Thus, we'll need to decide which function we call based on that bool.
More generally, the bool isn't just for the retrier - its also to tell the user if a payment has failed, or if a path has failed. Until you either see
all_paths_failed
orPaymentSent
, you cannot consider a payment resolved.
Ah, that's right. For the situation where we retry some paths of an MPP but they all ultimately fail, would the retrier then retry the full payment? Seems it may complicate the retrier as either there are two levels of retries (paths and full payment) or payments that are MPP need to be treated differently from non-MPP. But if we want to avoid the two-levels of retries, we'd also need to know if the payment was MPP. But I don't think the bool will tell us that?
I had hoped this could be generalized in some way where we are always just retrying paths regardless of whether it's MPP or not. But I guess that would require complicating the state tracking in ChannelManager
, possibly such that the user would need to say when they have given up (and thus considered the payment failed). 🙁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, come to think of it the retrier will know how many paths it gets back from the router, so this should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's right. For the situation where we retry some paths of an MPP but they all ultimately fail, would the retrier then retry the full payment?
Yes, I believe you generally would, as long as you haven't run out of time/retries.
Seems it may complicate the retrier as either there are two levels of retries (paths and full payment) or payments that are MPP need to be treated differently from non-MPP. But if we want to avoid the two-levels of retries, we'd also need to know if the payment was MPP. But I don't think the bool will tell us that?
Yes, the complexity is somewhat unavoidable, I think. Ultimately the channelmanager needs to know, when the retry HTLC(s) are sent, whether its an MPP retry or not. If the full MPP has failed, the channelmanager will (have to) forget about the full payment, so the retrier needs to inform the channelmanager of this. The way I was thinking about it was that the retrier would:
- if
all_paths_failed
is not set, callchannelmanager.retry_mpp_part(new_route)
and the channelmanager would look up the pending mpp data, generate the appropriate retry for only the given path, and send the new part(s), - if the above call fails with an
Err(NoPendingMPPFound)
orall_paths_failed
is set, the payment retrier will generate a route for the full payment value and callsend_payment
.
I'm not really sure how we could handle this better on the channelmanager end, sadly.
Actually, come to think of it the retrier will know how many paths it gets back from the router, so this should be sufficient.
Does the retrier need to store the pending route? I figured it would not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure how we could handle this better on the channelmanager end, sadly.
The only alternative, really, would be to change the API to allow retry_mpp_part
to reconstruct the state if you provide the full payment value, but I'm not sure that's ideal - if we get a PaymentFailed
that raced the first PaymentFailed
and is pending in the queue, with the last node having failed the payment, we shouldn't retry, and this API change would result in us retrying, but only part of the MPP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer a quick follow-up for renaming PaymentFailed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the retrier need to store the pending route? I figured it would not.
Correct. I was thinking it would see that get_route
returned multiple paths and note that (as a bool) for later. But I guess that isn't necessary if we allow retrying the full payment as well.
For counting retries, were you thinking the number of attempts across all paths and full payments would be counted together? Or would each path get its own attempts? Was thinking the former for simplicity.
As another related concern, what if get_route
returns multiple paths when attempting to retry a part? Seems like this might be desirable, but retry_mpp_part
would need to be able to handle this. Maybe that is what you meant by "part(s)".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For counting retries, were you thinking the number of attempts across all paths and full payments would be counted together? Or would each path get its own attempts? Was thinking the former for simplicity.
Dunno, should count something, I'll leave it up to the author :p.
As another related concern, what if get_route returns multiple paths when attempting to retry a part? Seems like this might be desirable, but retry_mpp_part would need to be able to handle this. Maybe that is what you meant by "part(s)".
Yes, ChannelManager should be able to handle this seamlessly.
366a29c
to
7f70496
Compare
Think I've addressed outstanding feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarification, are we still sending PaymentFailed
events for each path? If so, could you update the PR title and description?
To be used in upcoming commits for MPP ID storage
7f70496
to
3555fd2
Compare
We'll use this to correlate MPP shards in upcoming commits
by removing all pending outbound payments associated with the same MPP payment after the preimage is received
see field docs for details
3555fd2
to
c828ff4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK c828ff4. LGTM!
TODOs
- [x] dedupPaymentFailed
events- [x] test thatPaymentFailed
events are deduplicatedAlso, track whether all paths have failed in
PaymentFailed