Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow indication to BackgroundProcessor that graph sync is pending #1433

Merged

Conversation

arik-so
Copy link
Contributor

@arik-so arik-so commented Apr 19, 2022

When running a graph sync, the network graph should not be modified from other sources until graph sync completes to avoid inconsistencies or developer/user confusion. Similarly, the network graph should not be prematurely pruned, as graph sync allows very large timestamp deltas in graph snapshot comparisons.

To that end, I add an argument to the BackgroundProcessor to indicate that network-graph-related processed should be avoided pending the graph sync completion, which can be signaled by calling a new method.

This PR depends on #1155. Thanks to @jkczyz for the logic and idiomaticness improvement suggestions!

@TheBlueMatt TheBlueMatt added this to the 0.0.107 milestone Apr 19, 2022
@arik-so arik-so marked this pull request as ready for review April 20, 2022 02:57
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs docs, but, yea, I guess this is probably where we should go. Also needs docs in NetworkGraph::remove_stale_channels.

@codecov-commenter
Copy link

codecov-commenter commented Apr 28, 2022

Codecov Report

Merging #1433 (0d6284c) into main (0b77008) will increase coverage by 0.07%.
The diff coverage is 93.91%.

❗ Current head 0d6284c differs from pull request most recent head 784addc. Consider uploading reports for the commit 784addc to get more accurate results

@@            Coverage Diff             @@
##             main    #1433      +/-   ##
==========================================
+ Coverage   91.01%   91.09%   +0.07%     
==========================================
  Files          80       80              
  Lines       43409    43981     +572     
  Branches    43409    43981     +572     
==========================================
+ Hits        39509    40063     +554     
- Misses       3900     3918      +18     
Impacted Files Coverage Δ
lightning/src/routing/network_graph.rs 91.05% <84.61%> (+0.05%) ⬆️
lightning-rapid-gossip-sync/src/processing.rs 90.17% <92.68%> (+0.27%) ⬆️
lightning-rapid-gossip-sync/src/lib.rs 90.66% <95.00%> (+0.34%) ⬆️
lightning-background-processor/src/lib.rs 94.98% <97.29%> (+0.61%) ⬆️
lightning/src/chain/onchaintx.rs 93.98% <0.00%> (-0.93%) ⬇️
lightning/src/util/events.rs 41.66% <0.00%> (-0.33%) ⬇️
lightning/src/ln/functional_tests.rs 97.07% <0.00%> (-0.12%) ⬇️
lightning/src/lib.rs 100.00% <0.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b77008...784addc. Read the comment docs.

let mut seconds_slept = 0;
loop {
assert!(seconds_slept < 3);
std::thread::sleep(Duration::from_secs(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here.


// Set up a background event handler for FundingGenerationReady events.
let (sender, receiver) = std::sync::mpsc::sync_channel(1);
let event_handler = move |event: &Event| {
sender.send(handle_funding_generation_ready!(event, channel_value)).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use an empty event handler since you don't need to test that logic here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good point.

fn with_manager_error(self, error: std::io::ErrorKind, message: &'static str) -> Self {
Self { manager_error: Some((error, message)), ..self }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove blank line

@arik-so arik-so force-pushed the 2022-04-rapid-sync-bg-processor branch from 88b6287 to 19d8365 Compare May 6, 2022 05:11
@arik-so arik-so force-pushed the 2022-04-rapid-sync-bg-processor branch from 19d8365 to f698cf9 Compare May 25, 2022 21:52
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, one really general concern - users need to be able to write the latest-timestamp-synced value after the updated network graph has been written, which is currently not exposed in any way in the API. I think the simplest approach here is to actually require the user pass the NetworkGraphPersister in to the RapidGossipSync object and have it do the persisting itself. Interestingly, we probably don't actually want to ever bother persisting the network graph in the BP at all if we do that - its just duplicative and there should be only very very rare graph updates in between syncs anyway, I guess we can just turn off the persisting via the BP if the rapidgossipsync is there?

@@ -56,7 +58,7 @@ use std::ops::Deref;
#[must_use = "BackgroundProcessor will immediately stop on drop. It should be stored until shutdown."]
pub struct BackgroundProcessor {
stop_thread: Arc<AtomicBool>,
thread_handle: Option<JoinHandle<Result<(), std::io::Error>>>,
thread_handle: Option<JoinHandle<Result<(), std::io::Error>>>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please leave the excess trailing commas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grrr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like my IDE removed it in an auto-format

///
/// # Rapid Gossip Sync
///
/// If a rapid gossip sync is meant to run at startup, set `await_rapid_gossip_sync_completion`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/If a rapid gossip sync is meant to run at startup/If you're using rapid gossip sync/? Also, the variable name has changed and its not a bool anymore.

@@ -195,6 +204,11 @@ impl BackgroundProcessor {
{
let stop_thread = Arc::new(AtomicBool::new(false));
let stop_thread_clone = stop_thread.clone();
let is_initial_sync_complete = if let Some(rapid_sync) = rapid_gossip_sync {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why clone this here? Just hold the rapid_gossip_sync and access it in the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly, I might actually instead change it to taking a reference to a gossip sync. Absolutely no point in holding the entire sync struct imo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before you jump to it, lets discuss my top-level comment at #1433 (review)

let is_currently_awaiting_graph_sync = !is_initial_sync_complete.load(Ordering::Acquire);
if is_currently_awaiting_graph_sync {
log_trace!(logger, "Not pruning network graph due to pending gossip sync");
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still persist the scorer, so lets instead move the persist call into an if block, not use continue.

@arik-so
Copy link
Contributor Author

arik-so commented May 26, 2022

Which latest timestamp synced value are you referring to – the timestamp included in the rapid sync, the local timestamp when the rapid sync was applied, or the timestamp the latest regular sync update was applied?

@TheBlueMatt
Copy link
Collaborator

the timestamp included in the rapid sync

this one. which the user presumably needs to persist...somewhere? I'm starting (again) to lean towards just having a field in the NetworkGraph for this that users can read.

@arik-so arik-so force-pushed the 2022-04-rapid-sync-bg-processor branch 2 times, most recently from 2463863 to 8312dca Compare May 31, 2022 19:13
@arik-so arik-so force-pushed the 2022-04-rapid-sync-bg-processor branch 6 times, most recently from b45df06 to 30d4572 Compare May 31, 2022 21:26
network_graph.add_channel_from_partial_announcement(42, 53, features, nodes[0].node.get_our_node_id(), nodes[1].node.get_our_node_id())
.expect("Failed to update channel from partial announcement");
let original_graph_description = network_graph.to_string();
assert!(original_graph_description.contains("42: features: 0000, node_one:"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion doesn't seem necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It helps assert that channel 42 is actually there.

@@ -175,9 +184,11 @@ impl BackgroundProcessor {
PM: 'static + Deref<Target = PeerManager<Descriptor, CMH, RMH, L, UMH>> + Send + Sync,
S: 'static + Deref<Target = SC> + Send + Sync,
SC: WriteableScore<'a>,
RGS: 'static + Deref<Target = RapidGossipSync<G>> + Send
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this need to be Sync, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't appear to be.

let mut file = File::open(sync_path)?;
processing::update_network_graph_from_byte_stream(&network_graph, &mut file)
/// Rapid Gossip Sync holder struct
/// todo: ask reviewers for more interesting things to say here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would explain how this differs from P2P gossip syncing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code's outdated.

Comment on lines 85 to 86
/// Instantiate a new RapidGossipSync holder
/// todo: same as above
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't refer to this as a holder, FWIW.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code's outdated.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some trivial nits. Needs squashed down into a few separate commits, though (not just one).

@@ -1,11 +1,15 @@
// Import that needs to be added manually
// Imports that need to be added manually
use std::sync::Arc;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused import

let event_handler = |_: &_| {};
let bg_processor = BackgroundProcessor::start(persister, event_handler, nodes[0].chain_monitor.clone(), nodes[0].node.clone(), nodes[0].net_graph_msg_handler.clone(), nodes[0].peer_manager.clone(), nodes[0].logger.clone(), Some(nodes[0].scorer.clone()), nodes[0].rapid_gossip_sync.clone());

loop {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you added a whole graph persist notifier api - why not just wait until it fires here and then check the log afterwards rather than busy-polling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get the question. I can wait until is_initial_sync_complete is true, but there are no mpsc messages being sent or other events being fired. Whatever I do, it's gonna be busy polling. Are you thinking of the stuff I did on the server?

@@ -27,13 +27,16 @@
//! its contents from disk, which we do by calling the `sync_network_graph_with_file_path` method:
//!
//! ```
//! use std::sync::Arc;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused import

@arik-so
Copy link
Contributor Author

arik-so commented Jun 1, 2022

Happy to squash into separate commits, just lmk preferred grouping.

@arik-so arik-so force-pushed the 2022-04-rapid-sync-bg-processor branch 2 times, most recently from 6e41392 to 1f5c0e1 Compare June 1, 2022 18:28
use utils::test_logger;

/// Actual fuzz test, method signature and name are fixed
fn do_test(data: &[u8]) {
let block_hash = bitcoin::BlockHash::default();
let network_graph = lightning::routing::network_graph::NetworkGraph::new(block_hash);
lightning_rapid_gossip_sync::processing::update_network_graph(&network_graph, data);
let rapid_sync = RapidGossipSync::new(&network_graph);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arc snuck back in in the first commit and then was removed in the second.

@arik-so arik-so force-pushed the 2022-04-rapid-sync-bg-processor branch from 1f5c0e1 to 101c6e8 Compare June 1, 2022 19:03
@arik-so arik-so force-pushed the 2022-04-rapid-sync-bg-processor branch 3 times, most recently from 2cf6fc5 to 6c9af28 Compare June 1, 2022 21:01
@arik-so arik-so force-pushed the 2022-04-rapid-sync-bg-processor branch 3 times, most recently from 0f77ef5 to 2874198 Compare June 1, 2022 22:32
@TheBlueMatt
Copy link
Collaborator

Please break lines in the commit description (ie after the first line). Just shouldn't be > ~80 chars long.

/// [module-level documentation]: crate
pub struct RapidGossipSync<NG: Deref<Target=NetworkGraph>> {
network_graph: NG,
is_initial_sync_complete: AtomicBool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this redundant with NetworkGraph::last_rapid_gossip_sync_timestamp? Would be nice if last_rapid_gossip_sync_timestamp was in RapidGossipSync instead given it's not always needed with NetworkGraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's in network graph for easier serialization and persistence. Admittedly a bit of a hack, but one that users will appreciate

Comment on lines 291 to 295
if let Some(ref handler) = net_graph_msg_handler {
log_trace!(logger, "Pruning network graph of stale entries");
handler.network_graph().remove_stale_channels();
if let Err(e) = persister.persist_graph(handler.network_graph()) {
log_error!(logger, "Error: Failed to persist network graph, check your disk and permissions {}", e)
log_trace!(logger, "Assessing prunability of network graph");

// The network graph must not be pruned while rapid sync completion is pending
let should_prune = match rapid_gossip_sync.as_ref(){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the case that user of RapidGossipSync won't also use NetGraphMsgHandler? If so, there would need to be mutually exclusive persistence paths. Otherwise, the graph will never be persisted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, rapid gossip sync is merely a tool for fast initial graph initialization, a user may still very much opt to choose graph message handling for followup updates

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if they don't opt to use the p2p updates (i.e., when net_graph_msg_handler is None), won't the NetworkGraph never be persisted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is true. They do already have the downloaded snapshot, which actually takes less time to convert into a network graph than a conventionally serialized network graph, so I'm not sure that's really a problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not bother serializing the timestamp then? That way it can go inside RapidGossipSync instead of NetworkGraph. (cc: @TheBlueMatt)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely bother serializing the timestamp. I think the real question is whether we should expose the NetworkGraph access from the RapidGossipSync struct for persistence purposes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, net_graph_msg_handler may be None and we still very much definitely want to persist the network graph, whereas the current code won't get here if net_graph_msg_handler is None.

As for persisting the timestamp, I don't think we need to worry about persisting it or not, kinda who cares? And, indeed, someone may want to move from rapid to P2P or back, so might as well persist it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for persisting the timestamp, I don't think we need to worry about persisting it or not, kinda who cares? And, indeed, someone may want to move from rapid to P2P or back, so might as well persist it.

Hmm... if it doesn't matter if we persist it, but including it in NetworkGraph so we can persist it is a hack, why do it? Seems to just add complexity while exposing an API that only some users would care about but largely only exists for the lightning-rapid-gossip-sync crate's internal usage.

Not persisting it but keeping it internal to RapidGossipSync only affects the restart case. Or would fetching new rapid sync data be so infrequent that it would only happen on a restart (i.e., user app is mostly closed)? And thus persisting the timestamp would be important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main use case for rapid sync is mobile devices, where the most common state of the app will be "launched within the previous 30 seconds." Persisting the timestamp allows them to fetch differential snapshots whose size will be one or a couple hundred kB instead of 2MB. It can shave off a couple seconds on each app start.

@arik-so arik-so force-pushed the 2022-04-rapid-sync-bg-processor branch from 2874198 to 8cdac12 Compare June 1, 2022 23:33
TheBlueMatt
TheBlueMatt previously approved these changes Jun 2, 2022
handler.network_graph().remove_stale_channels();
if let Err(e) = persister.persist_graph(handler.network_graph()) {
log_error!(logger, "Error: Failed to persist network graph, check your disk and permissions {}", e)
let network_graph_optional = if let Some(ref handler) = net_graph_msg_handler {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You could collapse this if into the should_prune match above to remove some code and simplify things a bit.

}

impl<NG: Deref<Target=NetworkGraph>> RapidGossipSync<NG> {
/// Instantiate a new RapidGossipSync instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ticks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put ticks around the identifiers in rust docs so they are formatted.

}

last_prune_call = Instant::now();
have_pruned = true;
} else {
log_trace!(logger, "Not pruning network graph due to pending gossip sync");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or because neither sync was given?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phrased it differently

@@ -1078,6 +1078,18 @@ impl NetworkGraph {
}
}

/// The unix timestamp in UTC provided by the most recent rapid gossip sync.
/// It will be set by the rapid sync process after every sync completion.
pub fn get_last_rapid_gossip_sync_timestamp(&self) -> Option<u32> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we describe anywhere how this should be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the rapid gossip sync crate docs talk about the timestamp

Comment on lines 294 to 301
Some(rapid_sync) => {
if rapid_sync.is_initial_sync_complete() {
Some(rapid_sync.network_graph())
} else {
None
}
},
None => net_graph_msg_handler.as_ref().map_or(None, |handler| Some(handler.network_graph()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the interface is prone to misuse since the user could potentially pass in both syncs with two different NetworkGraphs, but only one would be persisted. Not sure if we really care but it would be a reason to prefer an enum over two Options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but with an enum, you cannot use both a rapid sync and a regular message handler. I can make an extra check verifying that the network graph references are the same somehow? Is there a way to do that that will work with Arc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but with an enum, you cannot use both a rapid sync and a regular message handler.

How useful is this for a mobile? Wouldn't the app only be open infrequently? Just trying to understand the use case of having both.

I can make an extra check verifying that the network graph references are the same somehow? Is there a way to do that that will work with Arc?

Probably some pointer comparison though not sure if we want to do so.

Copy link
Contributor Author

@arik-so arik-so Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How useful is this for a mobile? Wouldn't the app only be open infrequently? Just trying to understand the use case of having both.

That depends on whether mobile implementations will choose to obtain gossip data in between rapid syncs. If some apps envision being open for up to a minute or two, it's a relatively cheap thing to do in the background, and allows to minimize trust after the initial fast forward. But we're all just guessing here; I can survey some folks who'll be using it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just occurred to me that we currently use NetGraphMsgHandler to handle the network_update field of Event::PaymentPathFailed. We'll have to do something similar in the rapid-sync-only case.

arik-so added 2 commits June 2, 2022 10:14
Create a wrapper struct for rapid gossip sync that can be passed to
BackgroundProcessor's start method, allowing it to only start pruning
the network graph upon rapid gossip sync's completion.
@arik-so arik-so force-pushed the 2022-04-rapid-sync-bg-processor branch from 0d6284c to 784addc Compare June 2, 2022 17:16
Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, let's leave any refactors for a follow-up.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need to make sure we process payment failure events even without a NetGraphMsgHandler in a followup.

@jkczyz jkczyz merged commit 9c50083 into lightningdevkit:main Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants