`fatxpool`: do not use individual transaction listeners #7316

michalkucharczyk · 2025-01-23T15:56:26Z

Description

During 2s block investigation it turned out that ForkAwareTxPool::register_listeners call takes significant amount of time.

register_listeners: at HashAndNumber { number: 12, hash: 0xe9a1...0b1d2 } took 200.041933ms
register_listeners: at HashAndNumber { number: 13, hash: 0x5eb8...a87c6 } took 264.487414ms
register_listeners: at HashAndNumber { number: 14, hash: 0x30cb...2e6ec } took 340.525566ms
register_listeners: at HashAndNumber { number: 15, hash: 0x0450...4f05c } took 405.686659ms
register_listeners: at HashAndNumber { number: 16, hash: 0xfa6f...16c20 } took 477.977836ms
register_listeners: at HashAndNumber { number: 17, hash: 0x5474...5d0c1 } took 483.046029ms
register_listeners: at HashAndNumber { number: 18, hash: 0x3ca5...37b78 } took 482.715468ms
register_listeners: at HashAndNumber { number: 19, hash: 0xbfcc...df254 } took 484.206999ms
register_listeners: at HashAndNumber { number: 20, hash: 0xd748...7f027 } took 414.635236ms
register_listeners: at HashAndNumber { number: 21, hash: 0x2baa...f66b5 } took 418.015897ms
register_listeners: at HashAndNumber { number: 22, hash: 0x5f1d...282b5 } took 423.342397ms
register_listeners: at HashAndNumber { number: 23, hash: 0x7a18...f2d03 } took 472.742939ms
register_listeners: at HashAndNumber { number: 24, hash: 0xc381...3fd07 } took 489.625557ms

This PR implements the idea outlined in #7071. Instead of having a separate listener for every transaction in each view, we now use a single stream of aggregated events per view, with each stream providing events for all transactions in that view. Each event is represented as a tuple: (transaction-hash, transaction-status). This significantly reduce the time required for maintain.

Review Notes

single aggregated stream, provided by the individual view delivers events in form of (transaction-hash, transaction-status),
MultiViewListener now has a task. This task is responsible for:
- polling the stream map (which consists of individual view's aggregated streams) and the controller_receiver which provides side-channel commands (like AddView or FinalizeTransaction) sent from the transaction pool.
- dispatching individual transaction statuses and control commands into the external (created via API, e.g. over RPC) listeners of individual transactions,
external listener is responsible for status handling logic (e.g. deduplication of events, or ignoring some of them) and triggering statuses to external world (this was not changed).
level of debug messages was adjusted (per-tx messages shall be trace),

Closes #7071

…-listeners

substrate/client/transaction-pool/src/fork_aware_txpool/multi_view_listener.rs

substrate/client/transaction-pool/src/fork_aware_txpool/view_store.rs

substrate/client/transaction-pool/src/graph/listener.rs

skunert · 2025-01-30T07:54:23Z

substrate/client/transaction-pool/src/graph/listener.rs

+		aggregated_stream
+	}
+
+	/// Notify the listeners about the extrinsic broadcast.
 	pub fn broadcasted(&mut self, hash: &H, peers: Vec<String>) {
 		trace!(target: LOG_TARGET, "[{:?}] Broadcasted", hash);
 		self.fire(hash, |watcher| watcher.broadcast(peers));


So broadcasted event is a bit different and we handle it directly in the upper layers without involvement of the validated_pool?

Yes, it is directly call on the pool by the networking. There is dedicated method for this:

polkadot-sdk/substrate/client/transaction-pool/api/src/lib.rs

Lines 308 to 309 in 0d644ca

/// Notify the pool about transactions broadcast.

fn on_broadcasted(&self, propagations: HashMap<TxHash<Self>, Vec<String>>);

For new pool, we are just triggering event:

polkadot-sdk/substrate/client/transaction-pool/src/fork_aware_txpool/fork_aware_txpool.rs

Lines 831 to 833 in 0d644ca

fn on_broadcasted(&self, propagations: HashMap<TxHash<Self>, Vec<String>>) {

self.view_store.listener.transactions_broadcasted(propagations);

}

For the old pool, this went through validated_pool's listener:

polkadot-sdk/substrate/client/transaction-pool/src/single_state_txpool/single_state_txpool.rs

Lines 345 to 347 in 0d644ca

fn on_broadcasted(&self, propagations: HashMap<TxHash<Self>, Vec<String>>) {

self.pool.validated_pool().on_broadcasted(propagations)

}

polkadot-sdk/substrate/client/transaction-pool/src/graph/validated_pool.rs

Lines 682 to 687 in 0d644ca

pub fn on_broadcasted(&self, propagated: HashMap<ExtrinsicHash<B>, Vec<String>>) {

let mut listener = self.listener.write();

for (hash, peers) in propagated.into_iter() {

listener.broadcasted(&hash, peers);

}

}

skunert · 2025-01-30T08:47:04Z

substrate/client/transaction-pool/src/fork_aware_txpool/multi_view_listener.rs

+		>::default()));
+
+		let (tx, rx) = mpsc::tracing_unbounded("txpool-multi-view-listener-task-controller", 32);
+		let task = Self::task(external_controllers.clone(), rx);


I find that the flow is not so easy to follow. I thought about giving the task sole ownership of the external_controllers. This would get rid of the mutex. But requires some additional messages to manage the external streams list. What do you think?

I was also thinking about adding messages here. But at the end I decided that it will be more complex then having mutex.

After all the flow is not that complex. We need mutex to add external watcher controller (sink) into the map. This addition is made from the context of submit_and_watch call.

I don't know. If you think messages will make code more readable, I can give it a try (my little concern is proper order of processing, but probably should be fine).

I am thinking adding message passing is more complex too (as in we'll need to add APIs for sending/receiving messages, and for handling them), and at the same time it would make the logic around external_controllers easier to read/comprehend. However, we can also add a comment mentioning that the only place where we're writing is submit_and_watch and I would find it sufficient.

I am thinking that there might be some benefits of message passing over taking and holding the mutex - which can create contention if submit_and_watch is called (very) frequently - although, if rate-limitting is implemented at some upper level, this wouldn't be an issue.

All in all, I transform this to message passing solely from the performance consideration, if my reasoning is correct. I would also add the comment specifying where the external_controllers is written from.

I am considering using: DashMap which replacement for RwLock<HashMap>. Need to take a deeper look on this.

substrate/client/transaction-pool/src/fork_aware_txpool/multi_view_listener.rs

skunert · 2025-01-30T09:09:14Z

substrate/client/transaction-pool/src/graph/validated_pool.rs

 		self.listener.write().create_dropped_by_limits_stream()
 	}

+	/// Refer to [`Listener::create_aggregated_stream`]
+	pub fn create_aggregated_stream(


In general these changes mean that the submit_and_watch codepath is only used by the single state pool, so we can remove all of that once the fatxpool is the default.

Yes!

The only concern I have is that now we are also sending events for transactions that are not watched. So theoretically we are creating a unnecessary traffic in the aggregated stream on collators for gossiped transactions. (All notifications are simply dropped in the task, because there are no external controllers in the map).

I think it is not significant overhead (I would even say negligible), but maybe 🤔 we should block it. This will involve populating the hashmap with transactions which are allowed to be notified in the aggregated stream. But for now I would leave it as it is proposed.

…view_listener.rs Co-authored-by: Sebastian Kunert <[email protected]>

iulianbarbu

Left some stylistic comments and some questions/opinions. LGTM but I will approve after clarifying the comments (if we don't need other changes than the ones I suggested).

iulianbarbu · 2025-01-31T10:35:15Z

substrate/client/transaction-pool/src/fork_aware_txpool/multi_view_listener.rs

+			Controller<ExternalWatcherCommand<ChainApi>>,
+		>::default()));
+
+		let (tx, rx) = mpsc::tracing_unbounded("txpool-multi-view-listener-task-controller", 512);


Can you please extract the 512 into a constant variable?

iulianbarbu · 2025-01-31T10:38:18Z

substrate/client/transaction-pool/src/fork_aware_txpool/multi_view_listener.rs

 			return None
 		}

 		trace!(target: LOG_TARGET, "[{:?}] create_external_watcher_for_tx", tx_hash);

-		let (tx, rx) = mpsc::tracing_unbounded("txpool-multi-view-listener", 32);
-		controllers.insert(tx_hash, tx);
+		let (tx, rx) = mpsc::tracing_unbounded("txpool-multi-view-listener", 128);


Would be nice to have 128 extracted in a dedicated const.

iulianbarbu · 2025-01-31T10:49:35Z

substrate/client/transaction-pool/src/graph/listener.rs

-				trace!(target: LOG_TARGET, "[{:?}] dropped_sink: send message failed: {:?}", tx, e);
-			}
-		}
+		self.send_to_dropped_stream_sink(tx, TransactionStatus::Dropped);


why aren't we sending to the aggregated stream sink here as well?

iulianbarbu · 2025-01-31T10:49:59Z

substrate/client/transaction-pool/src/graph/listener.rs

-				trace!(target: LOG_TARGET, "[{:?}] dropped_sink: send message failed: {:?}", tx, e);
-			}
-		}
+		self.send_to_dropped_stream_sink(tx, TransactionStatus::Usurped(by.clone()));


Why aren't we sending to the aggregated stream sink here as well?

iulianbarbu · 2025-01-31T10:53:57Z

substrate/client/transaction-pool/src/graph/listener.rs

+	/// (external watcher) are not sent.
+	pub fn create_aggregated_stream(&mut self) -> AggregatedStream<H, BlockHash<C>> {
+		let (sender, aggregated_stream) =
+			tracing_unbounded("mpsc_txpool_aggregated_stream", 100_000);


Would be great to have a const variable with 100_000.

iulianbarbu · 2025-01-31T10:55:00Z

substrate/client/transaction-pool/src/graph/listener.rs

-	pub fn create_dropped_by_limits_stream(&mut self) -> DroppedByLimitsStream<H, BlockHash<C>> {
+	/// The stream can be used to subscribe to events related to dropping of all extrinsics in the
+	/// pool.
+	pub fn create_dropped_by_limits_stream(&mut self) -> AggregatedStream<H, BlockHash<C>> {
 		let (sender, single_stream) = tracing_unbounded("mpsc_txpool_watcher", 100_000);


Would be good to have 100_000 placed in a dedicated const here too.

iulianbarbu · 2025-01-31T11:10:23Z

substrate/client/transaction-pool/src/graph/listener.rs

+	/// ready and future statuses are reported via this channel to allow consumer of the stream
+	/// tracking actual drops.


What does it mean to allow consumer of the stream tracking actual drops? How does ready & future events on the same stream improve the dropped tracking?

iulianbarbu · 2025-01-31T12:09:29Z

substrate/client/transaction-pool/src/fork_aware_txpool/multi_view_listener.rs

+		>,
+		mut command_receiver: CommandReceiver<ControllerCommand<ChainApi>>,
+	) {
+		let mut aggregated_streams_map: StreamMap<BlockHash<ChainApi>, ViewStatusStream<ChainApi>> =


dq: Should we be concerned in practice with the size of this StreamMap? It increases when adding a new view, so in theory it can grow to arbitrary sizes, but in practice it might not get to extreme sizes due back pressure in other parts of the pool (for which I am having a hard time to reason about). The docs mention that it works best with a smallish number of streams as all entries are scanned on insert, remove, and polling.

iulianbarbu · 2025-01-31T12:23:37Z

substrate/client/transaction-pool/src/fork_aware_txpool/multi_view_listener.rs

+		>::default()));
+
+		let (tx, rx) = mpsc::tracing_unbounded("txpool-multi-view-listener-task-controller", 32);
+		let task = Self::task(external_controllers.clone(), rx);


I am thinking adding message passing is more complex too (as in we'll need to add APIs for sending/receiving messages, and for handling them), and at the same time it would make the logic around external_controllers easier to read/comprehend. However, we can also add a comment mentioning that the only place where we're writing is submit_and_watch and I would find it sufficient.

I am thinking that there might be some benefits of message passing over taking and holding the mutex - which can create contention if submit_and_watch is called (very) frequently - although, if rate-limitting is implemented at some upper level, this wouldn't be an issue.

All in all, I transform this to message passing solely from the performance consideration, if my reasoning is correct. I would also add the comment specifying where the external_controllers is written from.

iulianbarbu · 2025-01-31T12:48:04Z

substrate/client/transaction-pool/src/fork_aware_txpool/multi_view_listener.rs

-		let (tx, rx) = mpsc::tracing_unbounded("txpool-multi-view-listener", 32);
-		controllers.insert(tx_hash, tx);
+		let (tx, rx) = mpsc::tracing_unbounded("txpool-multi-view-listener", 128);
+		external_controllers.insert(tx_hash, tx);


I am thinking about dropping the lock right after this, to minimize how long the lock is held. WDYT?

…-listeners

paritytech-workflow-stopper · 2025-01-31T14:54:33Z

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/13075004797
Failed job name: test-linux-stable

michalkucharczyk added 3 commits January 23, 2025 12:59

some ground work

1985b2a

MultiViewListener meat

bcb2876

integrated

2b18e08

michalkucharczyk requested review from skunert and iulianbarbu January 23, 2025 16:13

michalkucharczyk marked this pull request as draft January 23, 2025 16:15

michalkucharczyk added R0-silent Changes should not be mentioned in any release notes T0-node This PR/Issue is related to the topic “node”. labels Jan 23, 2025

michalkucharczyk added 10 commits January 24, 2025 11:46

multi_view_listener: enum cleanup

588bc43

listener: improvements

fa06458

Merge remote-tracking branch 'origin/master' into mku-txpool-register…

44caaa0

…-listeners

todos removed (5495)

573fb47

listener: doc

d872d3e

flaky tests fixed

e103f48

log removed

e4f3d36

removed redundant method

6f75df0

some watched-flag related cleanup here and there

06d54d2

trace -> debug: per transaction logs

88595de

michalkucharczyk marked this pull request as ready for review January 29, 2025 11:45

Merge branch 'master' into mku-txpool-register-listeners

4dfe0f5

skunert reviewed Jan 30, 2025

View reviewed changes

michalkucharczyk and others added 2 commits January 30, 2025 13:33

Update substrate/client/transaction-pool/src/fork_aware_txpool/multi_…

cb7efe1

…view_listener.rs Co-authored-by: Sebastian Kunert <[email protected]>

review comments applied

a39450c

michalkucharczyk requested a review from skunert January 30, 2025 12:44

iulianbarbu reviewed Jan 31, 2025

View reviewed changes

Merge remote-tracking branch 'origin/master' into mku-txpool-register…

849acc3

…-listeners

fix

b961638

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`fatxpool`: do not use individual transaction listeners #7316

`fatxpool`: do not use individual transaction listeners #7316

michalkucharczyk commented Jan 23, 2025 •

edited

Loading

skunert Jan 30, 2025

michalkucharczyk Jan 30, 2025

skunert Jan 30, 2025

michalkucharczyk Jan 30, 2025

iulianbarbu Jan 31, 2025

michalkucharczyk Jan 31, 2025

skunert Jan 30, 2025

michalkucharczyk Jan 30, 2025

iulianbarbu left a comment

iulianbarbu Jan 31, 2025

iulianbarbu Jan 31, 2025

iulianbarbu Jan 31, 2025

iulianbarbu Jan 31, 2025

iulianbarbu Jan 31, 2025

iulianbarbu Jan 31, 2025

iulianbarbu Jan 31, 2025

iulianbarbu Jan 31, 2025 •

edited

Loading

iulianbarbu Jan 31, 2025

iulianbarbu Jan 31, 2025

paritytech-workflow-stopper bot commented Jan 31, 2025

	/// Notify the pool about transactions broadcast.
	fn on_broadcasted(&self, propagations: HashMap<TxHash<Self>, Vec<String>>);

	fn on_broadcasted(&self, propagations: HashMap<TxHash<Self>, Vec<String>>) {
	self.view_store.listener.transactions_broadcasted(propagations);
	}

	fn on_broadcasted(&self, propagations: HashMap<TxHash<Self>, Vec<String>>) {
	self.pool.validated_pool().on_broadcasted(propagations)
	}

	pub fn on_broadcasted(&self, propagated: HashMap<ExtrinsicHash<B>, Vec<String>>) {
	let mut listener = self.listener.write();
	for (hash, peers) in propagated.into_iter() {
	listener.broadcasted(&hash, peers);
	}
	}

		/// ready and future statuses are reported via this channel to allow consumer of the stream
		/// tracking actual drops.

fatxpool: do not use individual transaction listeners #7316

Are you sure you want to change the base?

fatxpool: do not use individual transaction listeners #7316

Conversation

michalkucharczyk commented Jan 23, 2025 • edited Loading

Description

Review Notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iulianbarbu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iulianbarbu Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paritytech-workflow-stopper bot commented Jan 31, 2025

`fatxpool`: do not use individual transaction listeners #7316

`fatxpool`: do not use individual transaction listeners #7316

michalkucharczyk commented Jan 23, 2025 •

edited

Loading

iulianbarbu Jan 31, 2025 •

edited

Loading