Changing raft.segment_max_entries set to higher than 2^16 (65,536) results in an exception (ra_log_segment_unexpected_eof) #9731
-
During performance testing of Quorum Queues, I noticed that with 0 publishers and 10 consumers, as long as a queue's backlog is under 3 million messages (1KB messages size for testing), the rabbitmq-perf-test consumers consume at a rate of 80,000+ msgs/sec. However, when the backlog grows bigger, the processing speed starts to come down. For example at 20M+ messages, we see the rates drop to under 10K/sec. @kjnilsson suggested that increased
When consumers are started, the followers crash immediately and cannot be recovered:
The queue is essentially unrecoverable at this stage. These log messages are seen in the follower's logs:
Version Details
OS (Debian Bullseye):
Reproduction steps
The queues will crash. Expected behaviorI did not expect the queue to crash. Because the queue crashes on both followers and the Additional contextNo response |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 13 replies
-
The number of messages is very likely not relevant at all. If you change Changing this setting for new clusters without any on disk segment files won't have this effect. This is a very common limitation for tools that store data on disk in files with N entries: you cannot update this value for an existing cluster (or at least a cluster with existing segment files). |
Beta Was this translation helpful? Give feedback.
-
I can reproduce an exception using the above commands but on 3.12.7, it is something different:
and this does not happen with the default value, which happens to be Using a single node with a single node with a previously wiped data directory was sufficient to trigger this exception. @@kjnilsson and investigating. |
Beta Was this translation helpful? Give feedback.
-
raft.segment_max_entries = 65536 leads to an exceptionWith 3.12.7, a single node brand new QQ (no on-the-fly Raft segment file size changes) logs
Values of 8192, 16834, 32768 do notHowever, with the value of |
Beta Was this translation helpful? Give feedback.
-
Another productive outcome from this discussion is this new brief QQ doc guide section on relevant performance tuning. Block device readahead and segment entry count increases won't benefit every workload but indeed, a "small messages, higher rates" workload is likely the most common one by far. |
Beta Was this translation helpful? Give feedback.
65356 is 2^16, so values that require more than two bytes overflow the size field in the segment file format:
This is confirmed by values up to 65355 not triggering this exception.