-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow indication to BackgroundProcessor that graph sync is pending #1433
Allow indication to BackgroundProcessor that graph sync is pending #1433
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs docs, but, yea, I guess this is probably where we should go. Also needs docs in NetworkGraph::remove_stale_channels
.
Codecov Report
@@ Coverage Diff @@
## main #1433 +/- ##
==========================================
+ Coverage 91.01% 91.09% +0.07%
==========================================
Files 80 80
Lines 43409 43981 +572
Branches 43409 43981 +572
==========================================
+ Hits 39509 40063 +554
- Misses 3900 3918 +18
Continue to review full report at Codecov.
|
let mut seconds_slept = 0; | ||
loop { | ||
assert!(seconds_slept < 3); | ||
std::thread::sleep(Duration::from_secs(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here.
|
||
// Set up a background event handler for FundingGenerationReady events. | ||
let (sender, receiver) = std::sync::mpsc::sync_channel(1); | ||
let event_handler = move |event: &Event| { | ||
sender.send(handle_funding_generation_ready!(event, channel_value)).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use an empty event handler since you don't need to test that logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good point.
fn with_manager_error(self, error: std::io::ErrorKind, message: &'static str) -> Self { | ||
Self { manager_error: Some((error, message)), ..self } | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove blank line
88b6287
to
19d8365
Compare
19d8365
to
f698cf9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, one really general concern - users need to be able to write the latest-timestamp-synced value after the updated network graph has been written, which is currently not exposed in any way in the API. I think the simplest approach here is to actually require the user pass the NetworkGraphPersister in to the RapidGossipSync
object and have it do the persisting itself. Interestingly, we probably don't actually want to ever bother persisting the network graph in the BP at all if we do that - its just duplicative and there should be only very very rare graph updates in between syncs anyway, I guess we can just turn off the persisting via the BP if the rapidgossipsync is there?
@@ -56,7 +58,7 @@ use std::ops::Deref; | |||
#[must_use = "BackgroundProcessor will immediately stop on drop. It should be stored until shutdown."] | |||
pub struct BackgroundProcessor { | |||
stop_thread: Arc<AtomicBool>, | |||
thread_handle: Option<JoinHandle<Result<(), std::io::Error>>>, | |||
thread_handle: Option<JoinHandle<Result<(), std::io::Error>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please leave the excess trailing commas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grrr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like my IDE removed it in an auto-format
/// | ||
/// # Rapid Gossip Sync | ||
/// | ||
/// If a rapid gossip sync is meant to run at startup, set `await_rapid_gossip_sync_completion` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/If a rapid gossip sync is meant to run at startup/If you're using rapid gossip sync/? Also, the variable name has changed and its not a bool anymore.
@@ -195,6 +204,11 @@ impl BackgroundProcessor { | |||
{ | |||
let stop_thread = Arc::new(AtomicBool::new(false)); | |||
let stop_thread_clone = stop_thread.clone(); | |||
let is_initial_sync_complete = if let Some(rapid_sync) = rapid_gossip_sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why clone this here? Just hold the rapid_gossip_sync
and access it in the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly, I might actually instead change it to taking a reference to a gossip sync. Absolutely no point in holding the entire sync struct imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before you jump to it, lets discuss my top-level comment at #1433 (review)
let is_currently_awaiting_graph_sync = !is_initial_sync_complete.load(Ordering::Acquire); | ||
if is_currently_awaiting_graph_sync { | ||
log_trace!(logger, "Not pruning network graph due to pending gossip sync"); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should still persist the scorer, so lets instead move the persist call into an if block, not use continue
.
Which latest timestamp synced value are you referring to – the timestamp included in the rapid sync, the local timestamp when the rapid sync was applied, or the timestamp the latest regular sync update was applied? |
this one. which the user presumably needs to persist...somewhere? I'm starting (again) to lean towards just having a field in the |
2463863
to
8312dca
Compare
b45df06
to
30d4572
Compare
network_graph.add_channel_from_partial_announcement(42, 53, features, nodes[0].node.get_our_node_id(), nodes[1].node.get_our_node_id()) | ||
.expect("Failed to update channel from partial announcement"); | ||
let original_graph_description = network_graph.to_string(); | ||
assert!(original_graph_description.contains("42: features: 0000, node_one:")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion doesn't seem necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It helps assert that channel 42 is actually there.
@@ -175,9 +184,11 @@ impl BackgroundProcessor { | |||
PM: 'static + Deref<Target = PeerManager<Descriptor, CMH, RMH, L, UMH>> + Send + Sync, | |||
S: 'static + Deref<Target = SC> + Send + Sync, | |||
SC: WriteableScore<'a>, | |||
RGS: 'static + Deref<Target = RapidGossipSync<G>> + Send |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this need to be Sync
, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't appear to be.
let mut file = File::open(sync_path)?; | ||
processing::update_network_graph_from_byte_stream(&network_graph, &mut file) | ||
/// Rapid Gossip Sync holder struct | ||
/// todo: ask reviewers for more interesting things to say here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would explain how this differs from P2P gossip syncing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code's outdated.
/// Instantiate a new RapidGossipSync holder | ||
/// todo: same as above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't refer to this as a holder, FWIW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code's outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some trivial nits. Needs squashed down into a few separate commits, though (not just one).
fuzz/src/process_network_graph.rs
Outdated
@@ -1,11 +1,15 @@ | |||
// Import that needs to be added manually | |||
// Imports that need to be added manually | |||
use std::sync::Arc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused import
let event_handler = |_: &_| {}; | ||
let bg_processor = BackgroundProcessor::start(persister, event_handler, nodes[0].chain_monitor.clone(), nodes[0].node.clone(), nodes[0].net_graph_msg_handler.clone(), nodes[0].peer_manager.clone(), nodes[0].logger.clone(), Some(nodes[0].scorer.clone()), nodes[0].rapid_gossip_sync.clone()); | ||
|
||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you added a whole graph persist notifier api - why not just wait until it fires here and then check the log afterwards rather than busy-polling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I get the question. I can wait until is_initial_sync_complete is true, but there are no mpsc messages being sent or other events being fired. Whatever I do, it's gonna be busy polling. Are you thinking of the stuff I did on the server?
@@ -27,13 +27,16 @@ | |||
//! its contents from disk, which we do by calling the `sync_network_graph_with_file_path` method: | |||
//! | |||
//! ``` | |||
//! use std::sync::Arc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused import
Happy to squash into separate commits, just lmk preferred grouping. |
6e41392
to
1f5c0e1
Compare
use utils::test_logger; | ||
|
||
/// Actual fuzz test, method signature and name are fixed | ||
fn do_test(data: &[u8]) { | ||
let block_hash = bitcoin::BlockHash::default(); | ||
let network_graph = lightning::routing::network_graph::NetworkGraph::new(block_hash); | ||
lightning_rapid_gossip_sync::processing::update_network_graph(&network_graph, data); | ||
let rapid_sync = RapidGossipSync::new(&network_graph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arc snuck back in in the first commit and then was removed in the second.
1f5c0e1
to
101c6e8
Compare
2cf6fc5
to
6c9af28
Compare
0f77ef5
to
2874198
Compare
Please break lines in the commit description (ie after the first line). Just shouldn't be > ~80 chars long. |
/// [module-level documentation]: crate | ||
pub struct RapidGossipSync<NG: Deref<Target=NetworkGraph>> { | ||
network_graph: NG, | ||
is_initial_sync_complete: AtomicBool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this redundant with NetworkGraph::last_rapid_gossip_sync_timestamp
? Would be nice if last_rapid_gossip_sync_timestamp
was in RapidGossipSync
instead given it's not always needed with NetworkGraph
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's in network graph for easier serialization and persistence. Admittedly a bit of a hack, but one that users will appreciate
if let Some(ref handler) = net_graph_msg_handler { | ||
log_trace!(logger, "Pruning network graph of stale entries"); | ||
handler.network_graph().remove_stale_channels(); | ||
if let Err(e) = persister.persist_graph(handler.network_graph()) { | ||
log_error!(logger, "Error: Failed to persist network graph, check your disk and permissions {}", e) | ||
log_trace!(logger, "Assessing prunability of network graph"); | ||
|
||
// The network graph must not be pruned while rapid sync completion is pending | ||
let should_prune = match rapid_gossip_sync.as_ref(){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it the case that user of RapidGossipSync
won't also use NetGraphMsgHandler
? If so, there would need to be mutually exclusive persistence paths. Otherwise, the graph will never be persisted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, rapid gossip sync is merely a tool for fast initial graph initialization, a user may still very much opt to choose graph message handling for followup updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if they don't opt to use the p2p updates (i.e., when net_graph_msg_handler
is None
), won't the NetworkGraph
never be persisted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is true. They do already have the downloaded snapshot, which actually takes less time to convert into a network graph than a conventionally serialized network graph, so I'm not sure that's really a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not bother serializing the timestamp then? That way it can go inside RapidGossipSync
instead of NetworkGraph
. (cc: @TheBlueMatt)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely bother serializing the timestamp. I think the real question is whether we should expose the NetworkGraph access from the RapidGossipSync struct for persistence purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, net_graph_msg_handler
may be None
and we still very much definitely want to persist the network graph, whereas the current code won't get here if net_graph_msg_handler
is None
.
As for persisting the timestamp, I don't think we need to worry about persisting it or not, kinda who cares? And, indeed, someone may want to move from rapid to P2P or back, so might as well persist it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for persisting the timestamp, I don't think we need to worry about persisting it or not, kinda who cares? And, indeed, someone may want to move from rapid to P2P or back, so might as well persist it.
Hmm... if it doesn't matter if we persist it, but including it in NetworkGraph
so we can persist it is a hack, why do it? Seems to just add complexity while exposing an API that only some users would care about but largely only exists for the lightning-rapid-gossip-sync
crate's internal usage.
Not persisting it but keeping it internal to RapidGossipSync
only affects the restart case. Or would fetching new rapid sync data be so infrequent that it would only happen on a restart (i.e., user app is mostly closed)? And thus persisting the timestamp would be important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main use case for rapid sync is mobile devices, where the most common state of the app will be "launched within the previous 30 seconds." Persisting the timestamp allows them to fetch differential snapshots whose size will be one or a couple hundred kB instead of 2MB. It can shave off a couple seconds on each app start.
2874198
to
8cdac12
Compare
handler.network_graph().remove_stale_channels(); | ||
if let Err(e) = persister.persist_graph(handler.network_graph()) { | ||
log_error!(logger, "Error: Failed to persist network graph, check your disk and permissions {}", e) | ||
let network_graph_optional = if let Some(ref handler) = net_graph_msg_handler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You could collapse this if into the should_prune
match above to remove some code and simplify things a bit.
} | ||
|
||
impl<NG: Deref<Target=NetworkGraph>> RapidGossipSync<NG> { | ||
/// Instantiate a new RapidGossipSync instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ticks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put ticks around the identifiers in rust docs so they are formatted.
} | ||
|
||
last_prune_call = Instant::now(); | ||
have_pruned = true; | ||
} else { | ||
log_trace!(logger, "Not pruning network graph due to pending gossip sync"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or because neither sync was given?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phrased it differently
@@ -1078,6 +1078,18 @@ impl NetworkGraph { | |||
} | |||
} | |||
|
|||
/// The unix timestamp in UTC provided by the most recent rapid gossip sync. | |||
/// It will be set by the rapid sync process after every sync completion. | |||
pub fn get_last_rapid_gossip_sync_timestamp(&self) -> Option<u32> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we describe anywhere how this should be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the rapid gossip sync crate docs talk about the timestamp
Some(rapid_sync) => { | ||
if rapid_sync.is_initial_sync_complete() { | ||
Some(rapid_sync.network_graph()) | ||
} else { | ||
None | ||
} | ||
}, | ||
None => net_graph_msg_handler.as_ref().map_or(None, |handler| Some(handler.network_graph())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the interface is prone to misuse since the user could potentially pass in both syncs with two different NetworkGraph
s, but only one would be persisted. Not sure if we really care but it would be a reason to prefer an enum over two Option
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but with an enum, you cannot use both a rapid sync and a regular message handler. I can make an extra check verifying that the network graph references are the same somehow? Is there a way to do that that will work with Arc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but with an enum, you cannot use both a rapid sync and a regular message handler.
How useful is this for a mobile? Wouldn't the app only be open infrequently? Just trying to understand the use case of having both.
I can make an extra check verifying that the network graph references are the same somehow? Is there a way to do that that will work with Arc?
Probably some pointer comparison though not sure if we want to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How useful is this for a mobile? Wouldn't the app only be open infrequently? Just trying to understand the use case of having both.
That depends on whether mobile implementations will choose to obtain gossip data in between rapid syncs. If some apps envision being open for up to a minute or two, it's a relatively cheap thing to do in the background, and allows to minimize trust after the initial fast forward. But we're all just guessing here; I can survey some folks who'll be using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just occurred to me that we currently use NetGraphMsgHandler
to handle the network_update
field of Event::PaymentPathFailed
. We'll have to do something similar in the rapid-sync-only case.
Create a wrapper struct for rapid gossip sync that can be passed to BackgroundProcessor's start method, allowing it to only start pruning the network graph upon rapid gossip sync's completion.
0d6284c
to
784addc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, let's leave any refactors for a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just need to make sure we process payment failure events even without a NetGraphMsgHandler
in a followup.
When running a graph sync, the network graph should not be modified from other sources until graph sync completes to avoid inconsistencies or developer/user confusion. Similarly, the network graph should not be prematurely pruned, as graph sync allows very large timestamp deltas in graph snapshot comparisons.
To that end, I add an argument to the BackgroundProcessor to indicate that network-graph-related processed should be avoided pending the graph sync completion, which can be signaled by calling a new method.
This PR depends on #1155. Thanks to @jkczyz for the logic and idiomaticness improvement suggestions!