-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always pick the best paths, rather than (poorly) attempting to randomize #1610
Always pick the best paths, rather than (poorly) attempting to randomize #1610
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1610 +/- ##
========================================
Coverage 90.84% 90.84%
========================================
Files 80 80
Lines 44675 44841 +166
Branches 44675 44841 +166
========================================
+ Hits 40583 40735 +152
- Misses 4092 4106 +14
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, I agree that the path randomization was not that great to begin with (especially since we keep sorting the candidates before/after). However, as this reverts what little indeterminism was introduced with #1359, I wonder what generally the path forward would be? Do we for example want to introduce a 'privacy budget' that can be spent either on overpaying fees or on choosing non-optimal routes?
lightning/src/routing/router.rs
Outdated
let mut cur_route = Vec::<PaymentPath>::new(); | ||
let mut aggregate_route_value_msat = 0; | ||
// First, sort by the cost-per-value of the path, selecting only the paths which | ||
// contribute the most per cost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cost-per-value" vs. "contribute the most per cost": Which one is it / should it be? cost-per-value or value-per-cost? I'd argue probably the latter, since we still want to drop paths that contribute less to improve success?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused - they're the same thing, at least as far as ordering goes? I tried to reword it some.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well mainly I wanted to point out the discrepancy between the comment and code. But yeah, they can of course be used somewhat analogous. That said, I assume you introduced the conversion to u128
and the shift by 64
to compare them without the need for floats? Since value >> cost
, could you get around that by simply sorting by value/cost
in ascending order, rather than cost/value
in descending order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we're now only selecting by cost efficiency, but not by overall amount as before. Why is this better than preferring a smaller number of larger value paths, which potentially reduced the failure probability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well mainly I wanted to point out the discrepancy between the comment and code. But yeah, they can of course be used somewhat analogous.
Right, I did update the comment.
That said, I assume you introduced the conversion to u128 and the shift by 64 to compare them without the need for floats? Since value >> cost, could you get around that by simply sorting by value/cost in ascending order, rather than cost/value in descending order?
Hmm, that's a good point. For the values our scorer uses that should work alright. That said, #1600 proposes using score values of a full bitcoin, so then we'd be liable to hit issues again. To avoid any scorer-specific logic in the router just shifting up into a u128
seemed the simplest, and while its gonna be really slow on most systems, it shouldn't matter much here, its at the end of the router and only across a handful of paths.
Also, we're now only selecting by cost efficiency, but not by overall amount as before. Why is this better than preferring a smaller number of larger value paths, which potentially reduced the failure probability?
Hmm, yea, that's a valid point - we may prefer to send a few large parts rather than a number of small parts. Still, that seems like something that should be up to the scorer - the router should seek to reduce the total cost in scorer+fee terms of the paths it selects, if the scorer wants to prefer larger parts, then great, if not, that's its problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's a good point. For the values our scorer uses that should work alright. That said, #1600 proposes using score values of a full bitcoin, so then we'd be liable to hit issues again. To avoid any scorer-specific logic in the router just shifting up into a
u128
seemed the simplest, and while its gonna be really slow on most systems, it shouldn't matter much here, its at the end of the router and only across a handful of paths.
Ah I see. Yeah, it works as it is.
Hmm, yea, that's a valid point - we may prefer to send a few large parts rather than a number of small parts. Still, that seems like something that should be up to the scorer - the router should seek to reduce the total cost in scorer+fee terms of the paths it selects, if the scorer wants to prefer larger parts, then great, if not, that's its problem.
Hum, I agree it's probably best to keep the routing logic as simple as possible and let the Scorer handle it going forward. And hopefully it works as intended and optimizes for failure probability.
Yea, I'm not super happy with it, but at the same time I'm relatively confident our "privacy gain" there was so close to zero its not really worth discussing at length. I'm somewhat thinking our current routing approach is just incompatible with a privacy design - we should consider doing some kind of "private routing", but I'm not sure it can be trivially accomplished with dijkstras. There's maybe a world where we do something like a "privacy budget" but I'd think we'd want to implement that by taking all channel fees below X and replace them with random(0-X) while doing the graph walking, thus making any path of the same length equally likely, or something like that (I believe this is kinda-sorta what CLN does, not sure). In any case, anything I can think of that would be a reasonably private approach simply doesn't start with "dijkstras strictly over the fee + cost, without randomization on the fee/cost function". |
b455d49
to
aad65c8
Compare
lightning/src/routing/router.rs
Outdated
let mut cur_route = Vec::<PaymentPath>::new(); | ||
let mut aggregate_route_value_msat = 0; | ||
// First, sort by the cost-per-value of the path, selecting only the paths which | ||
// contribute the most per cost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well mainly I wanted to point out the discrepancy between the comment and code. But yeah, they can of course be used somewhat analogous. That said, I assume you introduced the conversion to u128
and the shift by 64
to compare them without the need for floats? Since value >> cost
, could you get around that by simply sorting by value/cost
in ascending order, rather than cost/value
in descending order?
lightning/src/routing/router.rs
Outdated
let mut cur_route = Vec::<PaymentPath>::new(); | ||
let mut aggregate_route_value_msat = 0; | ||
// First, sort by the cost-per-value of the path, selecting only the paths which | ||
// contribute the most per cost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we're now only selecting by cost efficiency, but not by overall amount as before. Why is this better than preferring a smaller number of larger value paths, which potentially reduced the failure probability?
I don't think privacy is a binary space. Since there are always a lot of assumptions to be made on the adversary's end, introducing even mediocre amounts of uncertainty might go a long way in practice. Even randomly choosing to forgo the most optimal route and recomputing another one from time to time could break the 'shortest path' assumption any MITM needs to make when trying to break end-to-end privacy.
Absolutely, there are a lot of things that could be done to enable more private routing, what you sketched might be a good start. That said, I'll have a look what the state-of-the-art in CLN currently is. |
aad65c8
to
9687183
Compare
Yea, I think that's a good point. There's possibly more of a privacy cost here than I suggested in the description. Still, I'd much rather address that by changing the scorer to give it more control over the ultimate cost (ie not always adding the fee on the router side and have the scorer return that) so that the scorer can do things like what I suggested above and we don't have to touch the router for it. |
Rebased after merge of the dependent PR. |
Alright, so far looks good. I'll let someone else have a go and then will do another round of review. |
9687183
to
fa1ba8a
Compare
Rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read through the discussion on the rationales. All seems reasonable to me.
payment_path.update_value_and_recompute_fees(cmp::min(value_contribution_msat, final_value_msat)); | ||
value_contribution_msat = cmp::min(value_contribution_msat, final_value_msat); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to formulate a test that catches the incorrect behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so with the current code we end up relying on it being correct a good bit more, so reverting the first patch here causes existing tests to fail. As for building a fresh test for it...its pretty convoluted and we'd be testing side-effects - we can construct a path that requires up to 2x overpayment, but we're trying to gather 3x total paths, so if we try to build a test that fails to find a path it won't work. We could maybe get away with something where it finds a path but not the optimal one because an optimal one was found later, but that's...complicated and kinda a strange test? I dunno.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be alright if other tests are catching it now.
fa1ba8a
to
2b57097
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but one comment you may consider optional. Feel free to squash.
@@ -754,7 +754,7 @@ where L::Target: Logger, GL::Target: Logger { | |||
pub(crate) fn get_route<L: Deref, S: Score>( | |||
our_node_pubkey: &PublicKey, payment_params: &PaymentParameters, network_graph: &ReadOnlyNetworkGraph, | |||
first_hops: Option<&[&ChannelDetails]>, final_value_msat: u64, final_cltv_expiry_delta: u32, | |||
logger: L, scorer: &S, random_seed_bytes: &[u8; 32] | |||
logger: L, scorer: &S, _random_seed_bytes: &[u8; 32] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, good question. The first version of this patch did, and while it didn't change the public interface, it touched a ton of test code. I dropped it cause I wasn't sure if we would end up using it again and it doesn't change the public interface so I don't care too much about an unused bit of code. I don't feel super strongly but until we're in a place we're confident in with the router I kinda feel like leaving it. One thing, at least, that we may do, is add the random seed as a "per route" input to the scorer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, but I assume if we needed randomization in the Scorer, we'd pass it directly to it before giving it to find_route
? That said, I also don't think it's very pressing to remove it right now. Especially since I still like the idea of eventually introducing a proper Rand
interface, which may require a larger refactoring anyways. So feel free to leave as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? I dunno, I haven't really thought about it, its just not clear to me how we'd do it, so I figured not worry too much about removing the field right now.
If we end up "paying" for an `htlc_minimum_msat` with fees, we increment `already_collected_value_msat` by more than the amount of the path that we collected (who's `value_contribution_msat` is higher than the total payment amount, despite having been reduced down to the payment amount). This throws off our total value collection target, though in the coming commit(s) it would also throw off our path selection calculations.
Currently, after we've selected a number of candidate paths, we construct a route from a random set of paths repeatedly, and then select the route with the lowest total cost. In the vast majority of cases this ends up doing a bunch of additional work in order to select the path(s) with the total lowest cost, with some vague attempt at randomization that doesn't actually work. Instead, here, we simply sort available paths by `cost / amount` and select the top paths. This ends up in practice having the same end result with substantially less complexity. In some rare cases it gets a better result, which also would have been achieved through more random trials. This implies there may in such cases be a potential privacy loss, but not a substantial one, given our path selection is ultimately mostly deterministic in many cases (or, if it is not, then privacy is achieved through randomization at the scorer level).
Saturating a channel beyond 1/4 of its capacity seems like a more reasonable threshold for avoiding a path than 1/2, especially given we should still be willing to send a payment with a lower saturation limit if it comes to that. This requires an (obvious) change to some router tests, but also requires a change to the `fake_network_test`, opting to simply remove some over-limit test code there - `fake_network_test` was our first ever functional test, and while it worked great to ensure LDK worked at all on day one, we now have a rather large breadth of functional tests, and a broad "does it work at all" test is no longer all that useful.
2b57097
to
ff8d3f7
Compare
Squashed without further changes. |
Based on #1605, this removes the half-assed attempt at randomization in the router, instead simply picking the paths which have the lowest score-per-value-transferred. This ends up being basically the same thing anyway, but more directly, and simply. Further, it fixes some edge-case issues around not selecing enough balance, and, notable, spuriously failing to route if we ended up needing >= 50 paths to route (which is now a configurable number, not that users should be using 50 paths, really).
This then switches us to a saturation limit of 1/4, which I think is much cleaner, but the router cleanups are needed to make tests pass with that.
Tagging 0.0.110 cause I want it the 1/4 switch for it.