-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimisation for 'delegate' #3865
Conversation
This is copied from rabbitmq/rabbitmq-common#349 If a message is sent to only one queue(in most application scenarios), passing through the 'delegate' is meaningless. Otherwise, it increases the delay of the message and the possibility of 'delegate' congestion. Here are some test data: node1: Pentium(R) Dual-Core CPU E5300 @ 2.60GHz node2: Pentium(R) Dual-Core CPU E5300 @ 2.60GHz Join node1 and node2 to a cluster. Create 100 queues on node2, and start 100 consumers to receive messages from these queues. Start 100 publishers on node1 to send messages to the queues of node2. Each publisher will send 10k messages at the rate of 100/s(10k/s theoretically in total), and all the messages for all publishers is 1 million. Before optimisation: {1,[{msg_time,812312(=<1ms),177922(=<5ms),9507(=<50ms),221(=<500ms),38(=<1000ms),0,0,0,0,1061,1069,0,0}]} After optimisation: {1,[{msg_time,902854(=< 1ms),93993(=<5ms),3038(=<50ms),96(=<500ms),19(=<1000ms),0,0,0,0,1049,1060,0,0}]} Additional information: Time counted here is the stay time of a message in the cluster, that is, Time(leaving from node2 at) - Time(reaching node1 at). "812312(=<1ms)" is the number of messages with time consumption less than or equal to 1ms. Overall, the optimisation is effective.
@tomyouyou thank you for the updated pull request. Would you mind sharing the code you use to benchmark these changes? |
I modified the code as follows: The test commands are:
The following is the client test code - |
The python3 benchmark.py publish localhost queue_name=server-3865 but it publishes just one message. While this produces a nice histogram distribution of latency, we'd much rather prefer to compare using |
I'll see if I can figure out an equivalent PerfTest scenario. |
We have potentially identified subtle regressions that this optimization introduced. It is now being considered for reversion. |
Done in PR #3865. The changes in 3865 can cause message arrival ordering guarantees to be lost as a message sent to a single destination can overtake a prior message sent between the same two processes that was included in a fan-out.
That was done in PR #3865. The changes introduced in #3865 can cause message arrival ordering guarantees between two logical erlang process (sending messages via delegate) to be violated as a message sent to a single destination can overtake a prior message sent as part of a fan-out. This is due to the fact that the fan-out take a different route via the delegate process than the direct delivery that bypasses it. This commit only reverses it for the `invoke_no_result/2|3` API and leaves the optimisation in for the synchronous `invoke/` API. This means that the message send ordering you expect between erlang processes still can be violated when mixing invoke and invoke_no_result invocations. As far as I can see there are no places where the code relies on this and there are uses of invoke (mgmt db) that very well could benefit from avoiding the additional copying.
That was done in PR #3865. The changes introduced in #3865 can cause message arrival ordering guarantees between two logical erlang process (sending messages via delegate) to be violated as a message sent to a single destination can overtake a prior message sent as part of a fan-out. This is due to the fact that the fan-out take a different route via the delegate process than the direct delivery that bypasses it. This commit only reverses it for the `invoke_no_result/2|3` API and leaves the optimisation in for the synchronous `invoke/` API. This means that the message send ordering you expect between erlang processes still can be violated when mixing invoke and invoke_no_result invocations. As far as I can see there are no places where the code relies on this and there are uses of invoke (mgmt db) that very well could benefit from avoiding the additional copying.
That was done in PR #3865. The changes introduced in #3865 can cause message arrival ordering guarantees between two logical erlang process (sending messages via delegate) to be violated as a message sent to a single destination can overtake a prior message sent as part of a fan-out. This is due to the fact that the fan-out take a different route via the delegate process than the direct delivery that bypasses it. This commit only reverses it for the `invoke_no_result/2|3` API and leaves the optimisation in for the synchronous `invoke/` API. This means that the message send ordering you expect between erlang processes still can be violated when mixing invoke and invoke_no_result invocations. As far as I can see there are no places where the code relies on this and there are uses of invoke (mgmt db) that very well could benefit from avoiding the additional copying. (cherry picked from commit 8804434)
That was done in PR #3865. The changes introduced in #3865 can cause message arrival ordering guarantees between two logical erlang process (sending messages via delegate) to be violated as a message sent to a single destination can overtake a prior message sent as part of a fan-out. This is due to the fact that the fan-out take a different route via the delegate process than the direct delivery that bypasses it. This commit only reverses it for the `invoke_no_result/2|3` API and leaves the optimisation in for the synchronous `invoke/` API. This means that the message send ordering you expect between erlang processes still can be violated when mixing invoke and invoke_no_result invocations. As far as I can see there are no places where the code relies on this and there are uses of invoke (mgmt db) that very well could benefit from avoiding the additional copying. (cherry picked from commit 8804434) (cherry picked from commit 6c613b8)
This is copied from rabbitmq/rabbitmq-common#349
If a message is sent to only one queue(in most application scenarios), passing through the 'delegate' is meaningless. Otherwise, it increases the delay of the message and the possibility of 'delegate' congestion.
Here are some test data:
Before the change:
After the change:
Additional information:
Time counted here is the stay time of a message in the cluster, that is, Time(leaving from node2 at) - Time(reaching node1 at).
"812312(=<1ms)" is the number of messages with time consumption less than or equal to 1ms.
Overall, the optimisation is effective.