You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scrutinize the above. Usually (when nodes >= RF) the above ends after just a few iterations. Since tokens are distributed between nodes randomly, we will quickly find RF unique nodes. But if the number of nodes in the ring is below RF, this code will iterate over all tokens, which takes significant work.
I bumped into this during a benchmark. It caused the driver to spend more than 50% of its CPU time in plan() when driving a 2-node cluster with RF=3.
The text was updated successfully, but these errors were encountered:
Good catch. Looking at the code of network_topology_strategy_replicas I suspect we might have a similar problem with NetworkTopologyStrategy, too (but not 100% sure, didn't try to reproduce).
@havaker is refactoring load balancing right now (#449), so I think we should either fix it during or after the refactor.
michoecho
added a commit
to michoecho/scylla-rust-driver
that referenced
this issue
Feb 17, 2023
…iable
token_aware code uses cluster.ring_range(token).unique() to iterate over
candidate replicas until enough candidates are found to satisfy the RF.
This behaves badly when the number of candidates is smaller than RF --
we always iterate over the entire ring, which is very wasteful (it was
seen to slow down the driver by a factor of >2 in a simple performance
test).
Fix that by ending the iteration early when all unique candidate nodes were
already considered.
Fixesscylladb#452
scylla-rust-driver/scylla/src/transport/load_balancing/token_aware.rs
Lines 26 to 29 in 652f131
Scrutinize the above. Usually (when nodes >= RF) the above ends after just a few iterations. Since tokens are distributed between nodes randomly, we will quickly find RF unique nodes. But if the number of nodes in the ring is below RF, this code will iterate over all tokens, which takes significant work.
I bumped into this during a benchmark. It caused the driver to spend more than 50% of its CPU time in
plan()
when driving a 2-node cluster with RF=3.The text was updated successfully, but these errors were encountered: