-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Penalize failed channels #1144
Penalize failed channels #1144
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1144 +/- ##
==========================================
- Coverage 90.51% 90.42% -0.10%
==========================================
Files 69 70 +1
Lines 35865 39419 +3554
==========================================
+ Hits 32462 35643 +3181
- Misses 3403 3776 +373
Continue to review full report at Codecov.
|
371b971
to
0edc550
Compare
f538e01
to
a90a9e5
Compare
lightning/src/routing/scorer.rs
Outdated
#[cfg(not(feature = "no-std"))] | ||
fn decay_from(penalty_msat: u64, last_failure: &SystemTime, decay_interval: Duration) -> u64 { | ||
let decays = last_failure.elapsed().ok().map_or(0, |elapsed| { | ||
elapsed.as_secs() / decay_interval.as_secs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: handle divide by zero 😛
lightning-invoice/src/payment.rs
Outdated
where | ||
P::Target: Payer, | ||
R: Router, | ||
S: routing::Score, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternative to an owned Score
, should we consider a "two-layer" trait with a Score
and LockedScore
where router takes the second?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda prefer this way as it just uses plain references. Could be convinced otherwise, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm looking at this from two directions:
- in the case of bindings, we cannot serialize from a reference to a trait - all traits have to be concretized into a
dyn Trait
and from there you can't (type-safe) go back to the original type to serialize it. We'd have to add support for that, which would be highly language specific and checked at runtime or makeScore
requireWriteable
, which is somewhat awkward, though not insane. - in the general case, its a bit awkward for users to have to get their
InvoicePayer
to serialize theirScore
data - it ends up dictating a chunk of user layout, instead of taking a reference which gives the user more flexibility.
Obviously its somewhat awkward in that we end up forcing users into a two-layer wrapper thinggy, but luckily:
- Our implementation does works against it without the user having to write any additional characters,
- we can implement the parent trait for
Thing<Deref<Target=Mutex<ThingB: subtrait>>>
(or implement it forThing: Deref<Target=subtrait>
directly with no-std), making it somewhat transparent at least, - bindings users can use the API, at least kinda, though GC waiting to release the lock kinda sucks too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm looking at this from two directions:
- in the case of bindings, we cannot serialize from a reference to a trait - all traits have to be concretized into a
dyn Trait
and from there you can't (type-safe) go back to the original type to serialize it. We'd have to add support for that, which would be highly language specific and checked at runtime or makeScore
requireWriteable
, which is somewhat awkward, though not insane.
Just a thought, we could have S: routing::Score + Clone
and have the accessor return a copy. Seems like it would save a lot of trouble using multiple traits for a small cost. Would this solve the bindings issue at least an interim solution?
- in the general case, its a bit awkward for users to have to get their
InvoicePayer
to serialize theirScore
data - it ends up dictating a chunk of user layout, instead of taking a reference which gives the user more flexibility.
FWIW, they will already need to use Arc<InvoicePayer>
as it needs to be passed to the BackgroundProcessor
for event handling in addition to being used to make payments. So having, say, some persister utility wrapping Arc<InvoicePayer>
wouldn't be horrible. And, if written in Rust, wouldn't it just be a simple method call to persist not involving any references at the bindings level?
Obviously its somewhat awkward in that we end up forcing users into a two-layer wrapper thinggy, but luckily:
- Our implementation does works against it without the user having to write any additional characters,
- we can implement the parent trait for
Thing<Deref<Target=Mutex<ThingB: subtrait>>>
(or implement it forThing: Deref<Target=subtrait>
directly with no-std), making it somewhat transparent at least,- bindings users can use the API, at least kinda, though GC waiting to release the lock kinda sucks too.
I understand the problem and am trying to grok the trait-subtrait solution your proposing. I guess the point of it is so we can pass find_route
a LockedScore
. But the relationship between LockedScore
and Score
(which is the supertrait of which?) and what methods each has isn't so clear. Could you elaborate?
Wonder if using an associate type in some manner would make this simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought, we could have S: routing::Score + Clone and have the accessor return a copy.
Yea, that's something I'd been thinking about for a different reason (cause java does that a ton anyway), and I guess its okay? I'm not really a huge fan of clone'ing a hashmap that may get an entry for every channel in the graph (which may be the case for users doing probing), though, that could be a lot of data.
FWIW, they will already need to use Arc as it needs to be passed to the BackgroundProcessor for event handling in addition to being used to make payments. So having, say, some persister utility wrapping Arc wouldn't be horrible.
Ah, that's a good point re: user code complexity.
And, if written in Rust, wouldn't it just be a simple method call to persist not involving any references at the bindings level?
So we'd make Score : Writeable
and add a utility method to InvoicePayer
to just write the scores out directly? I'm not 100% sure where you're going with this.
I understand the problem and am trying to grok the trait-subtrait solution your proposing. I guess the point of it is so we can pass find_route a LockedScore. But the relationship between LockedScore and Score (which is the supertrait of which?) and what methods each has isn't so clear. Could you elaborate?
Sorry, I was using "supertrait" liberally (and incorrectly). What I was thinking of was (but with better naming):
trait Score {
type Scorer: LockedScore;
fn get_scorer(&self) -> Scorer;
}
trait LockedScore {
fn get_score(&self, scid: u64, ..) -> u64;
}
#[cfg(not(no_std))]
impl<LS: LockedScore, T: Deref<Target=Mutex<LockedScore>>> Score for T {
...
}
#[cfg(no_std)]
impl<LS: LockedScore, T: Deref<Target=LockedScore>> Score for T {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmmm, that could work. I guess it does feel a lot less explicit (in the sense that users lose the flexibility of selecting how locking works) and fairly different from our existing APIs which are built around Deref
s.
If we go this route, to make the bindings sane, we'd probably want to create a WriteableScore
trait that is just pub trait WriteableScore : Score + Writeable {}
and use that, then create a constructor for ScorePersister
that mirrors the InvoicePayer
constructor and just takes a WriteableScore
instead of a Score
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried your solution with some renaming based on our offline discussion. Ran into some lifetime hell while doing so... 😕 Pushed a commit that almost works. Any chance you could see where I've gone wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the MutexGuard
is making this tricky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grrrrrrrrr, right, so the direct thing you want to do here requires GATs, which may land in, like, the next version of rust or something. However, in trying to fix this I learned of a new rust syntax which seems to be exactly the subset of GATs that we want here - HRTB. I pushed a cargo check
'ing branch at https://git.bitcoin.ninja/index.cgi?p=rust-lightning;a=log;h=refs/heads/1144-magic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Got it working now. Only annoying thing about the blanket impl
is that I get a conflicting implementation error if I try to implement it with T: Deref<Target=RefCell<S>>
, which I was hoping to do for tests instead of using a Mutex
. Which I guess means any Deref
must reference a Mutex
? Might be some way to work around it by introducing another type parameter? Don't need to worry about this now, though.
lightning/src/routing/scorer.rs
Outdated
base_penalty_msat: u64, | ||
params: ScoringParameters, | ||
#[cfg(not(feature = "no-std"))] | ||
channel_failures: HashMap<u64, (u64, SystemTime)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an Instant
instead, no? We want a monotonic clock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instant
is opaque, so I'd imagine it would be difficult to persist, whereas with SystemTime
we can get a Duration
since the unix epoch. IIUC, any decrease in time as used with elapsed
would be before the first decay, so it shouldn't have any effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can persist the SystemTime::now() - Instant.elapsed()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm... but we can't deserialize it back into Instant
since the only way to create one is Instant::now
or from another Instant
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, yuck. Ummmmmmmm Instant::now() - (SystemTime::now() - deserialized_systemtime)
? Its gross, but more technically correct...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I convinced myself this is possible by serializing in terms of a Duration
since the UNIX epoch, and then upon deserialization do a similar adjustment.
let time = std::time::Instant::now();
let duration_since_epoch = std::time::SystemTime::now().duration_since(std::time::SystemTime::UNIX_EPOCH).unwrap();
let serialized_time = duration_since_epoch - time.elapsed();
println!("time: {:?}", time);
println!("duration_since_epoch: {:?}", duration_since_epoch);
println!("serialized_time: {:?}", serialized_time);
std::thread::sleep(std::time::Duration::from_secs(1));
let duration_since_epoch = std::time::SystemTime::now().duration_since(std::time::SystemTime::UNIX_EPOCH).unwrap();
let duration_since_instant = duration_since_epoch - serialized_time;
let deserialized_time = std::time::Instant::now() - duration_since_instant;
println!("duration_since_epoch: {:?}", duration_since_epoch);
println!("duration_since_instant: {:?}", duration_since_instant);
println!("deserialized_time: {:?}", deserialized_time);
dbedf14
to
1281d49
Compare
lightning-invoice/src/payment.rs
Outdated
@@ -117,15 +131,17 @@ use std::sync::Mutex; | |||
use std::time::{Duration, SystemTime}; | |||
|
|||
/// A utility for paying [`Invoice]`s. | |||
pub struct InvoicePayer<P: Deref, R, L: Deref, E> | |||
pub struct InvoicePayer<P: Deref, R, S, L: Deref, E> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should S
be a Deref
to a LockableScore
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We implement LockableScore
for T: Deref<Target=Mutex<S>>
, so I don't think we do unless there is another reason you're thinking of? See last commit for use with BackgroundProcessor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, right, does it work if we implement LockableScore
for Mutex<S>
instead? It seems like a Deref
here is more consistent with our API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done. Also, implemented for RefCell
now that it is possible.
27e7947
to
21cbd03
Compare
lightning-invoice/src/payment.rs
Outdated
@@ -152,8 +168,9 @@ pub trait Payer { | |||
/// A trait defining behavior for routing an [`Invoice`] payment. | |||
pub trait Router { | |||
/// Finds a [`Route`] between `payer` and `payee` for a payment with the given values. | |||
fn find_route( | |||
&self, payer: &PublicKey, params: &RouteParameters, first_hops: Option<&[&ChannelDetails]> | |||
fn find_route<S: routing::Score>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The S
bound should probably be on the trait itself, no? ie if a user always constructs a InvoicePayer
with CustomLocalUserRouter
then find_route
should take a CustomLocalUserRouter
and not a S
. If the complexity of the type annotations blows up as a result of this, feel free to ignore :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I quite follow. Why would find_route
take itself (in addition to self
) instead of a Score
?
I think you're trying to say that the type S: routing::Score
parameter used in Router
should be the same Score
as required by LockableScore<'a>::Locked
. The compiler happily leads me to use the following ugly syntax and to use PhantomData
in DefaultRouter
.
R: for <'a> Router<<<S as Deref>::Target as routing::LockableScore<'a>>::Locked>,
But it seems I'm just doing what the compiler is already inferring, no? Did I misunderstand? I suppose internally, we could accidentally call find_route
with an entirely different Score
than what the user parameterized InvoicePayer
with. But I don't think a user could do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the current code, it'd be
pub trait Router<S: routing::Score> {
fn find_route(&self, ...&S)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we're on the same page as discussed offline. Also, turns out I didn't need to use PhantomData
. I was being overzealous in where I was adding type parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the above two comments, LGTM.
21cbd03
to
269a85a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that some of the fixups on Notify scorer of failing payment path
probably should be on Parameterize InvoicePayer by routing::Score
. I'd be totally fine if those both just get squashed into one commit, though.
lightning/src/routing/mod.rs
Outdated
} | ||
} | ||
|
||
impl<S: Score, T: Deref<Target=S> + DerefMut<Target=S>> Score for T { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DerefMut
extends Deref
so you should be able to drop the Deref
bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shaping up IMO! Mostly docs requests
//! let scorer = Scorer::default(); | ||
//! | ||
//! // Or use a custom channel penalty. | ||
//! let scorer = Scorer::new(1_000); | ||
//! // Or use custom channel penalties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this says custom channel "penalties," could we change it to either say "penalty" or have multiple penalties be custom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another custom penalty.
} | ||
} | ||
|
||
/// Creates a new scorer using `penalty_msat` as a fixed channel penalty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could specify that this will only have a fixed base channel penalty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, missed this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the other penalty is zero so the overall penalty will be fixed.
@@ -830,6 +873,39 @@ mod tests { | |||
} | |||
} | |||
|
|||
#[test] | |||
fn scores_failed_channel() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a high-level comment? I'm a bit confused what this tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment. There is an expectation set on TestScorer
upon creation that it is called with a specific short_channel_id
. It will fail if it is not called at all or called with a different short_channel_id
.
} | ||
|
||
#[cfg(not(feature = "no-std"))] | ||
fn decay_from(penalty_msat: u64, last_failure: &Instant, half_life: Duration) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level doc for the return value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is refactored in #1146 into a method called decayed_penalty
, so will leave as is to avoid a lengthy rebase process. Happy to address any concerns in that PR. Probably should include units in the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, just noticed a lot of this is rewritten in #1146. Thanks!
|
||
#[cfg(not(feature = "no-std"))] | ||
fn decay_from(penalty_msat: u64, last_failure: &Instant, half_life: Duration) -> u64 { | ||
let decays = last_failure.elapsed().as_secs().checked_div(half_life.as_secs()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we test that it won't decay if less than half_life
has passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing is tested at the moment... 🙂 Will do in a follow-up. Confirmed this results in no shifts because checked_div
returns Some(0)
.
269a85a
to
5c8466b
Compare
Upon receiving a PaymentPathFailed event, the failing payment may be retried on a different path. To avoid using the channel responsible for the failure, a scorer should be notified of the failure before being used to find a new route. Add a payment_path_failed method to routing::Score and call it in InvoicePayer's event handler. Introduce a LockableScore parameterization to InvoicePayer so the scorer is locked only once before calling find_route.
As payments fail, the channel responsible for the failure may be penalized. Implement Scorer::payment_path_failed to penalize the failed channel using a configured penalty. As time passes, the penalty is reduced using exponential decay, though penalties will accumulate if the channel continues to fail. The decay interval is also configurable.
Proof of concept showing InvoicePayer can be used with an Arc<ChannelManager> passed to BackgroundProcessor. Likely do not need to merge this commit.
5c8466b
to
db05a14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Squashed without diff from Val's ACK, will land after CI:
$ git diff-tree -U1 5c8466b4 db05a14a
$
Adds a new
payment_path_failed
method torouting::Score
for penalizing failed channels, which is called byInvoicePayer
before retrying failed payments in the course of handlingPaymentPathFailed
events.Implements the new method in
Scorer
by applying a configurable penalty on top of the base penalty. The new penalty decays over time but may be further increased if the channel continues to fail.TODO: Test new
Scorer
behavior.