Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce graph sync crate #1155

Merged
merged 1 commit into from
May 25, 2022
Merged

Conversation

arik-so
Copy link
Contributor

@arik-so arik-so commented Nov 3, 2021

Its use is for fast-forwarding through gossip data downloaded from a server.

Currently still in draft status, pending better code organization and tests.

@TheBlueMatt
Copy link
Collaborator

Cool. I assume you're still working on compression for the objects?

@arik-so
Copy link
Contributor Author

arik-so commented Nov 8, 2021

Yes, I'm trying to build an encoder that is distinct from gzip, and just RLE, preferably with Huffman on top.

@arik-so arik-so marked this pull request as ready for review February 16, 2022 09:32
@codecov-commenter
Copy link

codecov-commenter commented Feb 16, 2022

Codecov Report

Merging #1155 (de7139d) into main (0c6974b) will increase coverage by 0.03%.
The diff coverage is 87.53%.

@@            Coverage Diff             @@
##             main    #1155      +/-   ##
==========================================
+ Coverage   90.93%   90.96%   +0.03%     
==========================================
  Files          77       80       +3     
  Lines       42394    42978     +584     
  Branches    42394    42978     +584     
==========================================
+ Hits        38552    39096     +544     
- Misses       3842     3882      +40     
Impacted Files Coverage Δ
lightning/src/lib.rs 100.00% <ø> (ø)
lightning/src/util/ser.rs 91.28% <ø> (ø)
lightning-rapid-gossip-sync/src/error.rs 62.50% <62.50%> (ø)
lightning/src/routing/network_graph.rs 90.98% <82.14%> (+0.94%) ⬆️
lightning-rapid-gossip-sync/src/processing.rs 89.90% <89.90%> (ø)
lightning-rapid-gossip-sync/src/lib.rs 90.32% <90.32%> (ø)
lightning/src/ln/msgs.rs 86.81% <100.00%> (ø)
lightning/src/ln/functional_tests.rs 97.11% <0.00%> (ø)
lightning-invoice/src/lib.rs 88.98% <0.00%> (+1.58%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c6974b...de7139d. Read the comment docs.

@jkczyz jkczyz self-requested a review February 16, 2022 15:18
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some initial thoughts. Is the server-side somewhere that you want review on or do you just want high-level stuff here first?

{
// make sure there is precisely one leading slash
let canonical_path = format!("/{}", sync_path.trim_start_matches("/"));
println!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a logger parameter and re-use our logging stuff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't even need a print, I had just used it for debugging, so I removed it instead.

sync_host: &str,
sync_port: u16,
sync_path: &str,
chain_source: &Option<CS>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets just drop the chain_source, we dont have signatures so its not like we can validate anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

let synthetic_update = UnsignedChannelUpdate {
chain_hash: Default::default(),
short_channel_id,
timestamp: current_timestamp,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, I think we need to use the update's real timestamp as otherwise we may get an update from an HTLC failure or otherwise and reject it cause the timestamp is "old".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will get back to this one in a separate commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the timestamp only tangentially related to the time of the actual update? I thought nodes had some logic to update the timestamp of a particular update when gossiping anyway – or is the timestamp part of the signed data when a signature is present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, if the timestamp inaccuracy has implications on HTLC handling, wouldn't that mean that gossip data is slightly more than just a DDoS risk vector?


Specifically, our custom channel flags break down like this:

| 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some significance in the spacing difference in the columns for 8 and 4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, my IDE just decided to add it. I'll remove.

@arik-so arik-so force-pushed the graph_sync_crate branch 3 times, most recently from c429f9c to 81616c2 Compare February 18, 2022 01:08
}
}
};
self.update_channel_unsigned(&channel_update)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need to be atomic with the earlier read. What would happen in the following scenario?

  • Form an UnsignedChannelUpdate from incremental update using the read-locked NetworkGraph above.
  • Apply ChannelUpdate from failed payment
  • Apply earlier formed UnsignedChannelUpdate

Seems the incremental update would override one we just received from a failed payment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be a problem, because all the rapid sync updates are backdated a week, so any "true" update from direct gossip should result in this update being discarded.

};

let announcement_result = network_graph
.update_channel_from_unsigned_announcement(&synthetic_announcement, &chain_source);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really a fan of this approach of generating a synthetic update and then feeding it in, at least not if the update has some fields set to bogus values, it seems like quite the layer violation for the lightning-graph-sync crate to have code that relies on certain very specific behavior in the NetworkGraph (that it ignores certain fields). I'd much prefer to have a method on the NetworkGraph that says "add channel with minimal field set of X" and use that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd much prefer to have a method on the NetworkGraph that says "add channel with minimal field set of X" and use that.

Isn't that a bit of a layer violation, too? Taking a concept from a higher layer (i.e., the incremental update) and defining it in a lower layer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, doesn't feel like one? Its not crazy for NetworkGraph to have a method that just says "manually add a new channel to the network graph, with these parameters". Its not really specific to this kind of sync mechanism, and could be used for several different sync mechanisms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I had been thinking more about gossip as the layer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we kinda have to layers, the gossip message handler, and the underlying NetworkGraph. I agree it'd be strange to put it in the gossip message handler, but we can put some of these update methods on the NetworkGraph itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think about possible approaches to this in the new few commits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would like to continue using synthetic channel updates considering their fields are either entirely specified, or inferred from the directional info, but point definitely taken wrt the announcements. I tried an approach in the latest commit, let me know what you think.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In whatever release we include this in, we'll need to fix the network graph data timeout logic to somehow not run until after graph sync completes on startup. I'm not sure how to do this, but it has to be in the same release as this (but can be a separate PR). Maybe open an issue to track it?

};

let chain_source: Option<Box<dyn lightning::chain::Access>> = None;
let current_timestamp = SystemTime::now()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're gonna get this from the update file itself, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was actually debating as to whether to include the timestamp in the update data. I'll modify the serialization accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update and reinitialize the server to have received timestamp fields tonight, and then regenerate test vectors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been addressed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully is now, though I wanna refrain from pushing until my fuzz test's working.

};

let announcement_result = network_graph
.update_channel_from_unsigned_announcement(&synthetic_announcement, &chain_source);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd much prefer to have a method on the NetworkGraph that says "add channel with minimal field set of X" and use that.

Isn't that a bit of a layer violation, too? Taking a concept from a higher layer (i.e., the incremental update) and defining it in a lower layer?

.duration_since(UNIX_EPOCH)
.unwrap()
.as_secs() as u32;
let regressed_timestamp = current_timestamp - 7 * 24 * 3600; // back-date updates by one week
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make a constant and document why backdating is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@arik-so arik-so force-pushed the graph_sync_crate branch 2 times, most recently from 1416dbf to 4580139 Compare February 19, 2022 00:24
@TheBlueMatt TheBlueMatt added this to the 0.0.106 milestone Feb 22, 2022
/// `timestamp: 64`: Timestamp the announcement was originally received.
///
/// All other parameters as used in `UnsignedChannelAnnouncement` fields.
pub fn update_channel_from_announcement_fragments(&self, short_channel_id: u64, timestamp: u64, features: ChannelFeatures, node_id_1: PublicKey, node_id_2: PublicKey) -> Result<(), LightningError>{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets just call this add_channel or something? This name is much more specific than it need be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with a different name, but add_channel is definitely not clear given the context of when this function should be called.

let node_1 = NodeId::from_pubkey(&node_id_1);
let node_2 = NodeId::from_pubkey(&node_id_2);
let chan_info = ChannelInfo {
features: features.clone(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this cloned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decloned

};

let mut add_channel_to_node = | node_id: NodeId | {
match nodes.entry(node_id) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you DRY this with the below function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, I think

let announcement_count: u32 = Readable::read(read_cursor)?;
for _ in 0..announcement_count {
let features = Readable::read(read_cursor)?;
let short_channel_id = Readable::read(read_cursor)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we sort by SCID and just include a variable-length-int difference from the previous one? That should cut the size down a lot, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, see #1155 (comment)

for _ in 0..announcement_count {
let features = Readable::read(read_cursor)?;
let short_channel_id = Readable::read(read_cursor)?;
let node_id_1 = Readable::read(read_cursor)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably do something to de-duplicate node_ids, no? Like maybe just do a list of node_ids to start and then put the index in that list here instead of the full node_id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, passing index using BigSize. I also explored sorting the pubkeys, but the differences are too big to be useful.

for _current_update_index in 0..update_count {
// println!("update index: {}", _current_update_index);

let short_channel_id: u64 = Readable::read(read_cursor)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, maybe sort them and put a difference instead of the full one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, using BigSize

let fee_base_msat: u32 = Readable::read(&mut read_cursor)?;
let fee_proportional_millionths: u32 = Readable::read(&mut read_cursor)?;

let has_htlc_maximum_msat: u8 = Readable::read(read_cursor)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe set a flag value instead? ie MAX_UINT64 means "no htlc-max", and the server-side can cap it down to 21 million BTC without loss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this saves space, because if there is no HTLC maximum, sending a u8 takes up less space than a u64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could do something interesting with BigSize?

Copy link
Contributor Author

@arik-so arik-so Mar 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sizes are equivalent when the number of present HTLCs == 7 * absent HTLCs. If there are more present HTLCs than absent ones, using the u64::MAX-flag saves space, and otherwise, using the u8-prefix saves space. We could add a global u8 flag at the beginning specifying which encoding is used in this transmission, and set it depending on the relation of the set/unset HTLC maxima.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was based on the assumption that the vast majority of full updates would have a htlc max, but, more importantly, see #1155 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on testing, it does indeed appear that the 7:1 ratio of set/updated HTLC maxima vs removed HTLC maxima is preserved, so for now we'll just always use u64::MAX

@arik-so arik-so force-pushed the graph_sync_crate branch 2 times, most recently from 2f1a7f8 to 625a8c5 Compare March 11, 2022 05:09
Comment on lines 190 to 199
let channel = read_only_network_graph.channels().get(&short_channel_id);
let channel = if let Some(channel) = channel {
channel
} else {
return Err(LightningError {
err: "Couldn't find channel for update".to_owned(),
action: ErrorAction::IgnoreError,
}
.into());
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, wherever you have an Option and need to return an error for None you can simplify with ok_or_else and ?, e.g.:

			let channel = read_only_network_graph
				.channels()
				.get(&short_channel_id)
				.ok_or_else(|| LightningError {
					err: "Couldn't find channel for update".to_owned(),
					action: ErrorAction::IgnoreError,
				})?;

return Ok(());
}
let non_incremental_update_presence_flag: u8 = Readable::read(read_cursor)?;
let has_non_incremental_updates = (non_incremental_update_presence_flag == 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing a build warning here from usage of parentheses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did fix it, but I'm low key annoyed by this. I find with = and ==, parentheses really improve legibility.

Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to sqaush and rebase on main along with removing commented code, etc.

Comment on lines 192 to 173
channel
.get_directional_info(direction)
.ok_or(LightningError {
err: "Couldn't find previous directional data for update".to_owned(),
action: ErrorAction::IgnoreError,
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move channel to previous line and reduce indent on other lines.

///
/// In the future, we will probably start delaying the purging of unsigned updates to their
/// four-week-marks, or perhaps simply trust a provided server-timestamp that marks
/// data receipt, as opposed to originally signed value. TODO: TBD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or complete TODO?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, yea, that's a tough one, too - in the current design the server has to send every SCID of channels in the current graph in each update or users will just have channels that get more and more out of date and then eventually remove them. I think the only realistic solution is to track where each update came from and only time it out based on that info, sadly, with some ability for users to "convert" a graph from "server-sync" to "p2p-sync"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that needs to be solved here, but we absolutely cannot ship this PR as-is without that followup, so would be nice to see it at least in a draft PR IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll try to have one up with the fuzzing for this one shortly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this TODO now?

};

let chain_source: Option<Box<dyn lightning::chain::Access>> = None;
let current_timestamp = SystemTime::now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been addressed?

let retrogressed_timestamp = current_timestamp - TIMESTAMP_RETROGRESSION_DELTA; // back-date updates by one week

// shadow variable for ownership release after length-restricted reads
let mut read_cursor = read_cursor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand why this is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh good point, addressed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this addressed in an un-pushed commit?

let read_only_network_graph = network_graph.read_only();
let channel = read_only_network_graph
.channels()
.get(&short_channel_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have performance concerns with the BTreeMap, one possibility would be to traverse the (sorted) channel updates in parallel with the tree rather than performing look-ups for each update. That would be quite an involved change though.

@arik-so arik-so force-pushed the graph_sync_crate branch 4 times, most recently from 2a2c0d4 to be1b49d Compare March 18, 2022 18:45
Comment on lines 184 to 173
let directional_info =
channel
.get_directional_info(channel_flags)
.ok_or(LightningError {
err: "Couldn't find previous directional data for update".to_owned(),
action: ErrorAction::IgnoreError,
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating previous lost comment: move channel up one line and reduce ident of following lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustfmt seems to strongly disagree with that FYI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol... rustfmt may disagree with itself given it did the opposite a few lines up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustfmt at it again lol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ffs, this is gonna keep happening

@arik-so
Copy link
Contributor Author

arik-so commented May 16, 2022

Sure, no problem.

@arik-so arik-so force-pushed the graph_sync_crate branch 2 times, most recently from 9ff6fe9 to cf14708 Compare May 16, 2022 21:51
Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this now. Could you squash?

@arik-so arik-so force-pushed the graph_sync_crate branch from cf14708 to 2debcde Compare May 19, 2022 03:55
@arik-so
Copy link
Contributor Author

arik-so commented May 19, 2022

Certainly!

@arik-so arik-so force-pushed the graph_sync_crate branch from 2debcde to 4d84970 Compare May 19, 2022 16:33
@TheBlueMatt
Copy link
Collaborator

nit: Please restrict git commit messages to 80-90 chars long. The title should be one line of max 80-90 chars. I generally keep mine even shorter but others let it go a bit longer.

@jkczyz
Copy link
Contributor

jkczyz commented May 19, 2022

nit: Please restrict git commit messages to 80-90 chars long. The title should be one line of max 80-90 chars. I generally keep mine even shorter but others let it go a bit longer.

Same. I use 50 characters and vim looks angry if I go over.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments here and there on tests, and a few doc comments, but after this I'm good to go here.

repository = "https://github.com/lightningdevkit/rust-lightning"
edition = "2018"
description = """
Utility to process gossip routing data from LNSync-like server (protocol name TBD)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to settle on the name before we land this. Maybe we just link to the server repo (which we should migrate to lightningdevkit org IMO)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect there are going to be multiple iterations here. My proposal is lightning-rapid-gossip-sync or lightning-rapid-gossip-sync-client, though I think client is somewhat implied given what repository it's in.


## Mechanism

The (presumed) server sends a compressed gossip response containing gossip data. The gossip data is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should link to the server repo (which I think we should go ahead and move into the lightningdevkit org)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

sync formats arise in the future
- The fourth byte is the protocol version in case our format gets updated
2. Chain hash (32 bytes)
3. Latest seen timestamp (`u32`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Lets move to generation time? That way we have a bit more determinism for the timestamps in the requests (ie we could fix them to be round numbers). Not a big deal either way, though, as they're already fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different data sets generated at the same time can have different latest seen timestamps. I think keeping it as is does not unnecessarily expand the attack surface, so I'd rather keep it.

## Delta Calculation

The way a server is meant to calculate this rapid gossip sync data is by using two data points as a
reference that are meant to be provided by the client:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only have one reference now, right? In general does this section need rewritten to talk about things from a snapshotting perspective?


## Performance

Given the primary purpose of this utility is a faster graph sync, we thought it might be helpful to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section needs a "as of date X with network graph of size Y"

assert!(after.contains("channels: [783241506229452801]"));
}

#[ignore]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not wholesale #[ignore] but rather only ignore failures to find the specific file (and then reuse the --cfg=require_route_graph_test configure flag we use to force the test to fail in CI if the file is missing).

);
if let Err(crate::error::GraphSyncError::IOError(io_error)) = &sync_result {
let error_string = io_error.to_string();
if error_string.to_lowercase().contains("no such file or directory") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why not just always panic with the error message and then include the io error in the message?

network_graph.update_channel_unsigned(&synthetic_update)?;
}

let input_termination_check: Result<u8, DecodeError> = Readable::read(read_cursor);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to allow for basic extensibility by just not checking for the end and letting there be excess stuff we ignore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this version, no. I think for now it's safer to make sure the entire response can be parsed correctly end to end.

let network_graph = NetworkGraph::new(block_hash);

let before = network_graph.to_string();
assert_eq!(before.len(), 31);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not assert that the print encoding is a specific format here (and in the other tests in this file). Asserting that it includes certain pubkeys seems fine, but not a specific character length or that pubkeys are printed with a : afterwards.

@@ -1132,6 +1142,81 @@ impl NetworkGraph {
self.update_channel_from_unsigned_announcement_intern(msg, None, chain_access)
}

/// Update channel from partial announcement data received via rapid gossip
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rapid gossip seems like a fine name but lets make sure we're consistent everywhere in what we call this.

@arik-so arik-so force-pushed the graph_sync_crate branch 5 times, most recently from bbb7e5a to ef712e3 Compare May 20, 2022 20:10

[dependencies]
lightning = { version = "0.0.106", path = "../lightning" }
bitcoin = { version = "0.28.1", default-features = false, features = ["secp-recovery"] }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I dont think we (directly) need secp-recovery?

implicitly extracted from the channel announcements and updates.

The data is then applied to the current network graph, artificially dated to the timestamp of the
latest seen message, be it an announcement or an update, from the server's perspective. The network
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minus a week, right?

## Delta Calculation

The way a server is meant to calculate this rapid gossip sync data is by using the latest timestamp
an change, be it either an announcement or an update, was seen. That timestamp is included in each
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an change?


## Delta Calculation

The way a server is meant to calculate this rapid gossip sync data is by using the latest timestamp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its really confusing to just say "latest timestamp" here - we mean the time we saw the latest update, not the timestamp in the update, which is what I'd expect this to refer to.


The way a server is meant to calculate this rapid gossip sync data is by using the latest timestamp
an change, be it either an announcement or an update, was seen. That timestamp is included in each
rapid sync message, so all the client needs to do is cache one variable.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no idea what you mean? Clients have to store the full graph, not just one variable?

//! The reason the rapid sync server requires a modicum of trust is that it could provide bogus
//! data, though at worst, all that would result in is a fake network topology, which wouldn't
//! enable the server to steal or siphon off funds. It could, however, reduce the client's privacy
//! by increasing the likelihood payments will are routed via channels the server controls.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/increasing the likelihood payments will are/forcing all payments to be/

//! privately calculate routes for payments, and do so much faster and earlier than requiring a full
//! peer-to-peer gossip sync to complete.
//!
//! The reason the rapid sync server requires a modicum of trust is that it could provide bogus
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/a modicum of//

//! (not least due to announcements not including any timestamps at all, but only a block height)
//! but rather, it's a timestamp of when the server saw a particular message.
//!
//! If a particular channel update had never occurred before, the full update is sent. If a channel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragrpah seems to be missing a lot of context to me, and I think we can just drop it and the next one, we don't need that much detail in the crate docs - just add the word "differential" in the second paragraph and I think we're good.

///
/// `sync_path`: Path to the file where the gossip update data is located
///
pub fn sync_network_graph_with_file_path(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in processing.rs - we need to return the timestamp number that the user needs to store so they can use that to query for the next version.

//!
//! In practice, snapshots of server data will be calculated at regular intervals, such that deltas
//! are cached and needn't be calculated on the fly. All the client needs to provide to the server
//! is the timestamp included in the previously received (and hence snapshotted) delta.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be docs for how to use this, too - right now its just a bag of functions that says "pass in a file or something you get from a server, but you have to query the server with some number that's not exposed to you". We should define a URI scheme (I'd suggest, like, https://graphsync.lightningdevkit.org/v1/$LAST_SYNC_TIMESTAMP.bin) and exact semantics for how to query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are docs in the server.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the client needs to directly link to the server. Also, if we intend to run a public server that anyone can use, we should have docs in the client that tells people how to use it - just having a "how to run your own server" guide doesn't help clients who shouldn't ever need to look at the server code (if they dont want to) :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the client does link to the server, which does have comments in the readme showing the request paths. I'm happy to copy the request paths section here, but given that every server instance is gonna use a different API prefix, not to mention domain, I don't know what else to say here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, given its only about 5 LoC, it'd be nice to have a full ``` comment here describing how you can sync the graph from graphsync.lightningdevkit.org. Given this is actually really simple to use, we should show people reading the docs here just how simple it is to use :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have the example service up, sure. Should we just say that that's gonna be an example URL in the future?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hope we can get an example service up pretty quick - we should be able to get one up in a matter of minutes, no? Lets just commit to doing it :)

Copy link
Contributor Author

@arik-so arik-so May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I can't commit to responding to all the server comments within minutes; some are a bit painful to address.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol nah, I meant we could take the server as-is and run it within five minutes to get an example Service up, and improving it a bit over time is just gravy :)

@arik-so arik-so force-pushed the graph_sync_crate branch 2 times, most recently from 44190ed to efed7d3 Compare May 20, 2022 21:45
pub fn sync_network_graph_with_file_path(
network_graph: &network_graph::NetworkGraph,
sync_path: &str,
) -> Result<(u32), GraphSyncError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the return value in a single-element tuple?

@@ -57,7 +75,7 @@ pub mod processing;
pub fn sync_network_graph_with_file_path(
network_graph: &network_graph::NetworkGraph,
sync_path: &str,
) -> Result<(u32), GraphSyncError> {
) -> Result<u32, GraphSyncError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document return type here, too.

//! An example server request could look as simple as the following:
//!
//! ```shell
//! curl -o rapid_sync.lngossip https://rapidsync.lightningdevkit.org/snapshot/<last_sync_timestamp>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this should be 0 the first time? We should say that.

//! use lightning::routing::network_graph::NetworkGraph;
//!
//! let block_hash = genesis_block(Network::Bitcoin).header.block_hash();
//! let network_graph = NetworkGraph::new(block_hash);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we say something about reading from disk?

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A few rather trivial notes which can just be addressed when squashing IMO.

let read_only_network_graph = network_graph.read_only();
let channel = read_only_network_graph
.channels()
.get(&short_channel_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not worth doing it here, but as a followup we should look into some kind of mutator accessor for the network graph, then we'd avoid the double-lookup here.

@TheBlueMatt
Copy link
Collaborator

LGTM. I'm ready for squash, @jkczyz ?

@arik-so arik-so force-pushed the graph_sync_crate branch from de7139d to a58ae4c Compare May 25, 2022 08:21
@jkczyz jkczyz merged commit aac3907 into lightningdevkit:main May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants