Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use log approximation in ProbabilisticScorer #1347

Merged
merged 3 commits into from
Mar 9, 2022

Conversation

jkczyz
Copy link
Contributor

@jkczyz jkczyz commented Mar 4, 2022

Since f64::log10 exists in std but not core, unconditionally use log approximation so --feature=no-std will compile when the crate is used as a dependency. Add a crate depending on no-std lightning crates in order to catch any regressions in CI.

Fixes #1340.

@codecov-commenter
Copy link

codecov-commenter commented Mar 4, 2022

Codecov Report

Merging #1347 (af99a94) into main (5e86bbf) will increase coverage by 0.14%.
The diff coverage is 100.00%.

❗ Current head af99a94 differs from pull request most recent head f041a64. Consider uploading reports for the commit f041a64 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1347      +/-   ##
==========================================
+ Coverage   90.60%   90.75%   +0.14%     
==========================================
  Files          72       72              
  Lines       40075    42254    +2179     
==========================================
+ Hits        36310    38347    +2037     
- Misses       3765     3907     +142     
Impacted Files Coverage Δ
lightning-invoice/src/utils.rs 88.14% <ø> (ø)
lightning/src/routing/router.rs 92.10% <ø> (ø)
lightning/src/util/test_utils.rs 84.52% <ø> (+2.08%) ⬆️
lightning/src/routing/scoring.rs 95.41% <100.00%> (+0.38%) ⬆️
lightning/src/ln/functional_tests.rs 96.76% <0.00%> (-0.37%) ⬇️
lightning/src/ln/channelmanager.rs 86.13% <0.00%> (+1.43%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e86bbf...f041a64. Read the comment docs.

@jkczyz jkczyz force-pushed the 2022-03-log-approximation branch from 915dcc9 to 865d1de Compare March 4, 2022 17:23
const LOG2_10: f64 = 3.322;

fn log10_approx(numerator: u64, denominator: u64) -> f64 {
(log2_approx(numerator) - log2_approx(denominator)) / LOG2_10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the division here by changing the constants in the lookup table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constants are only for the fractional parts, so the division would still be needed for the integer parts. Suppose I could make up a lookup table for that, too, given there would only be 64 entries. Would you prefer that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm confused why we have a lookup table for log2 and then convert to log10 instead of just having a lookup table in log10, but I haven't dug too deep into how this works to begin with. In any case, a 64-entry lookup table seems more than fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shift is used to get the exact integer part of the log2 (i.e., most-significant bit position) and the lookup table is used to approximate the fractional part. I'm not aware of a similarly efficient approximation for log10 though I haven't looked extensively into it.

Copy link
Contributor Author

@jkczyz jkczyz Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, note that by shift I meant finding the most significant bit set. Thought that's how it was done under the hood.

Edit: Looks like it might a machine instruction. Anyhow, earlier should be s/shift/most-significant bit position

Copy link
Collaborator

@TheBlueMatt TheBlueMatt Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we'd probably need two integer divisions by 10 to do something similar in log10, I guess? Still probably faster than a float divide, but not by a lot then.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL this is apparently not true! Float divides can be a bit faster than int.

#[inline]
fn log2_approx(x: u64) -> f64 {
let leading_zeros = x.leading_zeros();
let integer_part = (63 - leading_zeros) as f64;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining this?

@TheBlueMatt
Copy link
Collaborator

Maybe its not worth it, but I wonder if we should do this all in intops, given its basically in intops * 1 million already cause it only has three digits after the point in the constants. May be worth benchmarking - could be faster cause intops could be slower cause the flops cpu parts were previously entirely unused and are now unlocked, most probably its not even visible.

@jkczyz
Copy link
Contributor Author

jkczyz commented Mar 7, 2022

Maybe its not worth it, but I wonder if we should do this all in intops, given its basically in intops * 1 million already cause it only has three digits after the point in the constants. May be worth benchmarking - could be faster cause intops could be slower cause the flops cpu parts were previously entirely unused and are now unlocked, most probably its not even visible.

I was playing with the log10 variation of the look-up table for the most-significant bit, which results in single floating-point addition for each the numerator and the denominator and the the difference of the two. But then I realized the additions can be avoided using a single look-up table twice the size.

If we make those integers as you suggest, we'd simply have and integer subtraction for the numerator and denominator look-ups followed by multiplication by liquidity_penalty_multiplier_msat and integer division to get back to the correct magnitude. We'd have to upper bound liquidity_penalty_multiplier_msat to prevent overflow, though.

@jkczyz
Copy link
Contributor Author

jkczyz commented Mar 7, 2022

Ah, actually not twice the size but the product of the two table sizes. So 64*K where K is the number of bits to use following the most-significant bit.

@TheBlueMatt
Copy link
Collaborator

That still seems reasonable - there's not really much harm in having a pretty reasonably sized table here.

@jkczyz jkczyz force-pushed the 2022-03-log-approximation branch from 865d1de to ac22d43 Compare March 8, 2022 05:28
@jkczyz
Copy link
Contributor Author

jkczyz commented Mar 8, 2022

Alright, I pushed the change that removed floating-point operations by using a single look-up table of log10(x) * 1000. It uses the most significant bit plus the next four bits for approximation.

One way to think of it is as a window of a five bits over x where the window starts where the most significant bit is 1. So anything in front of the window is by definition zero. Anything behind the window is zeroed (i.e., the approximation). Then the log is taken. For anything 5 bits or less, the log is exact module rounding to three decimal places.

I added a test that generates the table if it helps understand how the table was generated or if it needs to be generated again using a different size.

devrandom
devrandom previously approved these changes Mar 8, 2022
default = ["lightning/no-std", "lightning-invoice/no-std"]

[dependencies]
lightning = { version = "0.0.105", path = "../lightning", default-features = false }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that the version is optional when specifying the path. it might be more convenient to not specify, so you don't have to update when releasing a new version. this applies elsewhere too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder why the version is specified in other crates (e.g. lightning-invoice when referring to lightning), maybe worth removing in a followup PR.

@@ -646,20 +646,23 @@ impl<T: Time> ChannelLiquidity<T> {
}

impl<L: Deref<Target = u64>, T: Time, U: Deref<Target = T>> DirectedChannelLiquidity<L, T, U> {
/// Returns the success probability of routing the given HTLC `amount_msat` through the channel
/// in this direction.
fn success_probability(&self, amount_msat: u64) -> f64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good, this was the only floating-point use in the repo, and it's nice to be able to be able to target on platforms that don't have FP.

@@ -122,6 +122,10 @@ jobs:
cargo test --verbose --color always --no-default-features --features no-std
# check if there is a conflict between no-std and the default std feature
cargo test --verbose --color always --features no-std
# check no-std compatibility across dependencies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, did you figure out why compiling in-crate doesn't uncover the issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't unfortunately. In-crate compiling does catch anything using std but apparently not methods unavailable on primitive types. Maybe there is some configuration that would trigger it? I was using cargo check -p lightning --no-default-features --features=no-std.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something we should maybe report as a rustc bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, I see the compilation fail with a minimal example in Rust playground:

#![no_std]

fn main() {
    1.0_f64.log10();
}
   Compiling playground v0.0.1 (/playground)
error: language item required, but not found: `eh_personality`
  |
  = note: this can occur when a binary crate with `#![no_std]` is compiled for a target where `eh_personality` is defined in the standard library
  = help: you may be able to compile for a target that doesn't need `eh_personality`, specify a target with `--target` or in `.cargo/config`

error: `#[panic_handler]` function required, but not found

error[[E0599]](https://doc.rust-lang.org/stable/error-index.html#E0599): no method named `log10` found for type `f64` in the current scope
 [--> src/main.rs:4:13
](https://play.rust-lang.org/#)  |
4 |     1.0_f64.log10();
  |             ^^^^^ method not found in `f64`

For more information about this error, try `rustc --explain E0599`.
error: could not compile `playground` due to 3 previous errors

If I change the empty lib.rs file in no-std-check to use a minimal example, I only see the last error.

#![no_std]

pub fn log(x: f64) -> f64 {
    x.log10()
}      

So I wonder if there is something about our configuration that is causing this to not be caught.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps the issue is with code that is not reachable from main(), but only from tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore my last comment, sounds like you did try it without a call from main

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so it sounds like the bug is that rustc doesn't detect this as an issue if we're building for test/check instead of building for run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, rustc does catch it when using cd no-std-check && cargo check with the modified lib.rs:

    Checking no-std-check v0.1.0 (/Users/jkczyz/src/rust-lightning/no-std-check)
error[E0599]: no method named `log10` found for type `f64` in the current scope
 --> src/lib.rs:4:7
  |
4 |     x.log10()
  |       ^^^^^ method not found in `f64`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0599`.
error: could not compile `no-std-check`

To learn more, run the command again with --verbose.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so the issue is that some crates in the workspace are not no-std so everything gets built with std?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... moving the other crates to exclude doesn't seem to make a difference when modifying some code to use log10 and running cargo check --no-default-features --features=no-std. However, it does catch an added use of println.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks basically good to me, except for the obvious trivial optimization of shifting instead of dividing.

@TheBlueMatt
Copy link
Collaborator

Github seems to think this is substantially slower than upstream, that doesn't feel too surprising given we're basically taking stuff that was running (probably mostly in parallel with everything else) on the FPU and replacing it with some intops and memory loads, I'm hoping compacting the table to get it more in cache and dropping the divide improves things, but either way I think we have to eat the loss.

@jkczyz jkczyz force-pushed the 2022-03-log-approximation branch from 6a8f107 to af99a94 Compare March 9, 2022 00:10
@jkczyz
Copy link
Contributor Author

jkczyz commented Mar 9, 2022

Github seems to think this is substantially slower than upstream, that doesn't feel too surprising given we're basically taking stuff that was running (probably mostly in parallel with everything else) on the FPU and replacing it with some intops and memory loads, I'm hoping compacting the table to get it more in cache and dropping the divide improves things, but either way I think we have to eat the loss.

Comparing this run from another PR:

https://github.com/lightningdevkit/rust-lightning/runs/5469854141?check_suite_focus=true

test routing::router::benches::generate_mpp_routes_with_default_scorer       ... bench: 125,549,779 ns/iter (+/- 27,155,586)
test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 118,418,792 ns/iter (+/- 45,428,377)
test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer  ... bench:  71,001,913 ns/iter (+/- 29,592,953)
test routing::router::benches::generate_routes_with_default_scorer           ... bench:  45,200,079 ns/iter (+/- 11,678,323)
test routing::router::benches::generate_routes_with_probabilistic_scorer     ... bench:  49,446,698 ns/iter (+/- 18,776,663)
test routing::router::benches::generate_routes_with_zero_penalty_scorer      ... bench:  35,418,974 ns/iter (+/- 17,261,870)

vs a run on this PR from earlier:

https://github.com/lightningdevkit/rust-lightning/runs/5467595213?check_suite_focus=true

test routing::router::benches::generate_mpp_routes_with_default_scorer       ... bench: 136,333,985 ns/iter (+/- 32,316,346)
test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 131,460,387 ns/iter (+/- 43,776,862)
test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer  ... bench: 106,201,639 ns/iter (+/- 41,524,806)
test routing::router::benches::generate_routes_with_default_scorer           ... bench:  65,041,238 ns/iter (+/- 16,034,790)
test routing::router::benches::generate_routes_with_probabilistic_scorer     ... bench:  65,458,760 ns/iter (+/- 24,604,211)
test routing::router::benches::generate_routes_with_zero_penalty_scorer      ... bench:  50,211,564 ns/iter (+/- 23,330,715)

I see that between the two this PR is slower. But there is a similar slowdown in the other scorers as well.

@TheBlueMatt
Copy link
Collaborator

In local tests it looks like the current code is maybe a slight improvement over upstream, if anything. Didn't bother trying to run the version with the divide in it.

In any case, code LGTM, just needs commits squashed.

/// Look-up table for `log10(x) * 1024` where row `i` is used for each `x` having `i` as the
/// most significant bit. The next 4 bits of `x`, if applicable, are used for the second index.
const LOG10_TIMES_1024: [[u16; LOWER_BITS_BOUND as usize]; BITS as usize] = [
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we could get rid of the first 3 rows, right? Cause the first row is just representing 1, the second row 2 and 3, the third row... I don't care if you bother with it, though, up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, though I wasn't sure if it was worth adding branching for those cases. Keeping it as it is also simplifies the indexing.

jkczyz added 3 commits March 8, 2022 23:22
Since f64::log10 exists in std but not core, unconditionally use log
approximation so --feature=no-std will compile.
To ensure no-std is honored across dependencies, add a crate depending
on lightning crates supporting no-std. This should ensure any
regressions are caught. Otherwise, cargo doesn't seem to catch some
incompatibilities (e.g., f64::log10 unavailable in core) and seemingly
across other dependencies as describe here:

https://blog.dbrgn.ch/2019/12/24/testing-for-no-std-compatibility/
@jkczyz jkczyz force-pushed the 2022-03-log-approximation branch from af99a94 to f041a64 Compare March 9, 2022 05:24
@jkczyz jkczyz merged commit 1a73449 into lightningdevkit:main Mar 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

no_std regression in scorer
4 participants