Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Fix import queue thread pool shutdown #4929

Merged
merged 4 commits into from
Feb 17, 2020
Merged

Fix import queue thread pool shutdown #4929

merged 4 commits into from
Feb 17, 2020

Conversation

arkpar
Copy link
Member

@arkpar arkpar commented Feb 14, 2020

Due to the way rs-futures ThreadPool is implemented, Importing threads may outlive the queue instance and even the main thread. This leads to a race on closing rocksdb database vs cleaning the C++ runtime in the main thread, causing segfaults and preventing the database from being closed properly.

This PR adds explicit synchronization between importing threads and the queue shutdown.

Fixes #4913

@arkpar arkpar added the A0-please_review Pull request needs code review. label Feb 14, 2020
// Flush the queue and close the receiver to terminate the future.
let _ = self.sender.unbounded_send(ToWorkerMsg::Shutdown);
let (_, closed) = buffered_link::buffered_link();
drop(std::mem::replace(&mut self.result_port, closed));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do this?

Copy link
Member Author

@arkpar arkpar Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with close

fn drop(&mut self) {
drop(self.pool.take());
// Flush the queue and close the receiver to terminate the future.
let _ = self.sender.unbounded_send(ToWorkerMsg::Shutdown);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let _ = self.sender.unbounded_send(ToWorkerMsg::Shutdown);
self.sender.close_channel();

let mut pool = futures::executor::ThreadPool::builder()
.name_prefix("import-queue-worker-")
.pool_size(1)
.pool_size(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we increase the size?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there just one worker anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was testing if thread tracking code is working properly. Eventually there should be more workers, but this is out of scope of this PR.

@arkpar
Copy link
Member Author

arkpar commented Feb 14, 2020

Also fixed an issue with slog global guard being disposed too early.

Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to use close_channel for the sender ;) Otherwise it looks good.

@@ -144,6 +178,7 @@ enum ToWorkerMsg<B: BlockT> {
ImportBlocks(BlockOrigin, Vec<IncomingBlock<B>>),
ImportJustification(Origin, B::Hash, NumberFor<B>, Justification),
ImportFinalityProof(Origin, B::Hash, NumberFor<B>, Vec<u8>),
Shutdown,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Shutdown,

@@ -239,6 +274,7 @@ impl<B: BlockT, Transaction: Send> BlockImportWorker<B, Transaction> {
ToWorkerMsg::ImportJustification(who, hash, number, justification) => {
worker.import_justification(who, hash, number, justification);
}
ToWorkerMsg::Shutdown => return Poll::Ready(()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ToWorkerMsg::Shutdown => return Poll::Ready(()),

fn drop(&mut self) {
drop(self.pool.take());
// Flush the queue and close the receiver to terminate the future.
let _ = self.sender.unbounded_send(ToWorkerMsg::Shutdown);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let _ = self.sender.unbounded_send(ToWorkerMsg::Shutdown);
self.sender.close_channel();

Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe add some test that takes some fake block import which just waits 30secs and we drop the BasicQueue directly.

@@ -40,9 +41,28 @@ pub struct BasicQueue<B: BlockT, Transaction> {
manual_poll: Option<Pin<Box<dyn Future<Output = ()> + Send>>>,
/// A thread pool where the background worker is being run.
pool: Option<futures::executor::ThreadPool>,
pool_guard: Arc<(Mutex<usize>, Condvar)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. There's no control over the lifetimes of objects passed to ThreadPoolBuilder. It uses some reference counting internally and relying on implementation details seems wrong.
  2. Don't want to introduce another dependency over a trivial thing.

@arkpar arkpar added the B0-silent Changes should not be mentioned in any release notes label Feb 16, 2020
@bkchr bkchr merged commit 579ea21 into master Feb 17, 2020
@bkchr bkchr deleted the a-no-thread-pool branch February 17, 2020 09:49
General-Beck pushed a commit to General-Beck/substrate that referenced this pull request Feb 18, 2020
* Fix import queue thread pool shutdown

* Make sure runtime is disposed before telemetry

* Close channel istead of sending a message

* Fixed test
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

check-block sometimes Segmentation fault: 11
3 participants