-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep track of connections, introduce per-vhost limits #891
Conversation
Fixes #500, #627. Squashed commit of the following: commit 88036dc Author: Michael Klishin <[email protected]> Date: Thu Jul 21 03:31:25 2016 +0300 Refactor commit fc84b7a Merge: df745e2 df28c63 Author: Michael Klishin <[email protected]> Date: Wed Jul 20 18:30:19 2016 +0300 Merge branch 'master' into rabbitmq-server-500 commit df745e2 Author: Michael Klishin <[email protected]> Date: Wed Jul 20 18:04:59 2016 +0300 Force close connections when vhost is deleted Fixes #627, related to #500. commit 2167f8f Author: Michael Klishin <[email protected]> Date: Wed Jul 20 16:00:35 2016 +0300 Add tests for per-vhost connection limits commit 2a032a3 Author: Michael Klishin <[email protected]> Date: Wed Jul 20 01:53:07 2016 +0300 Rename a few tests commit 86ce592 Author: Michael Klishin <[email protected]> Date: Wed Jul 20 01:44:10 2016 +0300 Tests for connection re-registration idempotency commit a774c7b Author: Michael Klishin <[email protected]> Date: Tue Jul 19 04:05:20 2016 +0300 Ask nodes that come back to re-register their connections Depending on the partition handling mode used there may or may not be any clients still connected. We make sure that registration and deregistration functions are idempotent and assume there may be connections on the node that has come back. Point of improvement: when a node comes back up, N-1 nodes will tell it to re-register connections. It could be fewer than N-1, ideally just 1. commit 24e4c0e Author: Michael Klishin <[email protected]> Date: Mon Jul 18 17:05:17 2016 +0300 Fix boot step commit 62da3c6 Author: Michael Klishin <[email protected]> Date: Mon Jul 18 11:16:21 2016 +0300 Compile commit b656f9e Merge: f2831e1 492406e Author: Michael Klishin <[email protected]> Date: Thu Jul 14 15:25:49 2016 +0300 Merge branch 'master' into rabbitmq-server-500 commit f2831e1 Merge: e5858e9 7b10a4e Author: Michael Klishin <[email protected]> Date: Thu Jul 7 13:45:31 2016 +0300 Merge branch 'master' into rabbitmq-server-500 commit e5858e9 Author: Michael Klishin <[email protected]> Date: Wed Jul 6 12:32:56 2016 +0300 Towards working connection re-registration after (inter-node) network splits commit 548df73 Author: Michael Klishin <[email protected]> Date: Wed Jul 6 12:32:07 2016 +0300 Make network split simulation work as expected commit 4028c66 Author: Michael Klishin <[email protected]> Date: Tue Jul 5 14:43:37 2016 +0300 Close connections using rabbit_ct_client_helpers Per discussion with @dumbbell. commit 26fecc9 Author: Michael Klishin <[email protected]> Date: Tue Jul 5 04:17:52 2016 +0300 Extract connection limit partition tests into a separate suite commit 8a466f1 Author: Michael Klishin <[email protected]> Date: Tue Jul 5 04:17:41 2016 +0300 Better logging commit b06de9b Author: Michael Klishin <[email protected]> Date: Mon Jul 4 02:54:54 2016 +0300 Modify a test so that it (expectedly) fails commit 078a78a Author: Michael Klishin <[email protected]> Date: Mon Jul 4 02:44:58 2016 +0300 Towards covering node termination/unavailability in connection tracking commit ab99361 Author: Michael Klishin <[email protected]> Date: Sun Jul 3 15:25:10 2016 +0300 These are moved to rabbit_ct_broker_helpers commit 520b6ef Author: Michael Klishin <[email protected]> Date: Sun Jul 3 03:54:52 2016 +0300 {allow,block}_traffic_between/2 are moved to rabbit_ct_broker_helpers commit b842eaa Merge: 26eb1fa d4f031e Author: Michael Klishin <[email protected]> Date: Sun Jul 3 03:14:27 2016 +0300 Merge branch 'master' into rabbitmq-server-500 commit 26eb1fa Author: Michael Klishin <[email protected]> Date: Sun Jul 3 02:39:09 2016 +0300 dist_proxy helpers moved to rabbit_ct_broker_helpers commit 3d741f4 Author: Michael Klishin <[email protected]> Date: Sun Jul 3 01:28:44 2016 +0300 Cluster node shutdown test commit 57c7129 Author: Michael Klishin <[email protected]> Date: Sat Jul 2 23:01:46 2016 +0300 Refactor commit b736b30 Author: Michael Klishin <[email protected]> Date: Sat Jul 2 22:49:42 2016 +0300 More tests commit dc1cb5f Author: Michael Klishin <[email protected]> Date: Sat Jul 2 22:27:16 2016 +0300 More tests commit e94edfe Author: Michael Klishin <[email protected]> Date: Sat Jul 2 17:08:34 2016 +0300 Initial per-vhost connection limit tests commit 15b7b4e Author: Michael Klishin <[email protected]> Date: Sat Jul 2 15:04:57 2016 +0300 Adapt to master, compile commit dc7f333 Merge: e4884ff bb1fa55 Author: Michael Klishin <[email protected]> Date: Sat Jul 2 02:44:18 2016 +0300 Merge branch 'master' into rabbitmq-server-500 commit e4884ff Merge: 71e2710 f0f43f8 Author: Michael Klishin <[email protected]> Date: Wed Jun 29 14:27:40 2016 +0300 Merge branch 'master' into rabbitmq-server-500 commit 71e2710 Merge: b1ec9f3 704a2b5 Author: Michael Klishin <[email protected]> Date: Thu Mar 31 01:55:29 2016 +0300 Merge branch 'master' into rabbitmq-server-500 Conflicts: src/rabbit_control_main.erl src/rabbit_types.erl commit b1ec9f3 Author: Michael Klishin <[email protected]> Date: Mon Feb 15 13:51:37 2016 +0300 Stub out event handlers for #627 and #628 commit f3cfb57 Author: Michael Klishin <[email protected]> Date: Sat Feb 13 01:33:50 2016 +0300 Use a counter column to track number of connections per vhost Limit query time is now 50-70 microseconds for 50M connections. commit e9132f1 Author: Michael Klishin <[email protected]> Date: Fri Feb 12 06:23:51 2016 +0300 Ignore ./debug commit 976e3ae Author: Michael Klishin <[email protected]> Date: Fri Feb 12 06:20:01 2016 +0300 Switch to ets:select_count/2 commit ec23cf1 Author: Michael Klishin <[email protected]> Date: Thu Feb 11 05:11:08 2016 +0300 Enforce max connection limit Also introduce `rabbitmqctl clear_vhost_limits` and fix rabbitmqctl(1). commit ba20887 Merge: 49a1886 8974581 Author: Michael Klishin <[email protected]> Date: Thu Feb 11 02:16:24 2016 +0300 Merge branch 'master' into rabbitmq-server-500 commit 49a1886 Author: Michael Klishin <[email protected]> Date: Wed Feb 10 16:45:00 2016 +0300 Spelling commit 723e6e4 Author: Michael Klishin <[email protected]> Date: Wed Feb 10 16:31:34 2016 +0300 Create secondary indices on rabbit_tracked_connection.vhost and username commit b468c0f Merge: 6940d05 0120438 Author: Michael Klishin <[email protected]> Date: Wed Feb 10 12:23:38 2016 +0300 Merge branch 'master' into rabbitmq-server-500 commit 6940d05 Author: Michael Klishin <[email protected]> Date: Mon Feb 8 01:30:03 2016 +0300 Spam commit 032c2a6 Merge: 46da39c 2374ae8 Author: Michael Klishin <[email protected]> Date: Fri Feb 5 23:48:38 2016 +0300 Merge branch 'master' into rabbitmq-server-500 commit 46da39c Merge: 655e351 05361e6 Author: Michael Klishin <[email protected]> Date: Wed Feb 3 11:20:08 2016 +0300 Merge branch 'master' into rabbitmq-server-500 commit 655e351 Author: Michael Klishin <[email protected]> Date: Wed Feb 3 11:19:23 2016 +0300 Store and delete tracked connections in a table commit 4e849cf Author: Michael Klishin <[email protected]> Date: Tue Jan 19 17:56:14 2016 +0300 Compile commit 504adde Author: Michael Klishin <[email protected]> Date: Tue Jan 19 17:55:08 2016 +0300 Switch to a handler for connection tracking (WIP) commit 3e1d2b4 Author: Michael Klishin <[email protected]> Date: Tue Jan 19 14:46:09 2016 +0300 Migrations for virtual host limits and tracked connections commit 7499020 Author: Michael Klishin <[email protected]> Date: Fri Jan 8 19:40:36 2016 +0300 Compile commit f3a1101 Author: Michael Klishin <[email protected]> Date: Fri Jan 8 19:14:12 2016 +0300 Switch rabbitmqctl set_vhost_limits to use JSON payload values Just like policies do. commit 7fc5f1a Author: Michael Klishin <[email protected]> Date: Wed Jan 6 19:07:50 2016 +0300 Stub out set_vhost_limits in ctl
Feel free to QA but I have one more test case to add => do not merge yet. |
Also: this has a |
[] -> | ||
mnesia:write(?TABLE, Conn, write), | ||
mnesia:dirty_update_counter( | ||
rabbit_tracked_connection_per_vhost, VHost, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to use ?PER_VHOST_COUNTER_TABLE
here.
This approach suffers from the lost updates problem when connections from a failed node are removed concurrently. We have evaluated a couple of approaches and the most prominent so far seems to be:
So, one writer, many readers. This should also make it possible to eliminate the connection re-registration mechanism in place today. |
case rabbit_connection_tracking:list(VHost) of | ||
[] -> {ok, State}; | ||
Cs -> | ||
[rabbit_networking:close_connection(Pid, rabbit_misc:format("vhost '~s' is deleted", [VHost])) || #tracked_connection{pid = Pid} <- Cs], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is a case statement is necessary here? if Cs = []
, then nothing to evaluate in the list comprehension, and you can just return {ok, State}
.
This way we have a single writer, multiple readers and no lost counter update due to natural race conditions. Data for tables from available nodes are then aggregated. Two open questions: * Whether to use a parallel version of lists:map/2 * When to clean up the tables
connected_at = proplists:get_value(connected_at, EventDetails), | ||
pid = proplists:get_value(pid, EventDetails), | ||
peer_host = proplists:get_value(peer_host, EventDetails), | ||
peer_port = proplists:get_value(peer_port, EventDetails)}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might not lie on the critical path, but use of proplists:get_value/2
seems quite extensive. how about using rabbit_misc:pget/2
instead, considering that it was optimised in rabbitmq/rabbitmq-common#51
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, thank you.
This affected connection tracking in the stats DB, now fixed: rabbitmq/rabbitmq-management#274. |
@dumbbell this is ready for another round. |
catch _ -> 0 | ||
end | ||
end, rabbit_mnesia:cluster_nodes(running)), | ||
lists:foldl(fun(X, Acc) -> Acc + X end, 0, Ns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelklishin you're going through 2 lists just to calculate number of connections? imagine how much this will impact performance (per connection open request) as cluster size increases? can you use lists:foldl/3
only please, with connection count as your accumulator. there's no need for lists:map/2
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but you are worrying about the wrong thing here. What really needs to happen is doing a parallel map but for small clusters it will likely be less efficient so I decided that it can be done at a later point.
It's not common to see clusters with more than maybe 7-9 nodes and it's unlikely to be a major contributor to how many connections/second can be accepted. Plus if you don't have any limits configured, none of this code will be executed.
Tab = tracked_connection_table_name_for(Node), | ||
mnesia:dirty_match_object(Tab, #tracked_connection{vhost = VHost, _ = '_'}) | ||
end, rabbit_mnesia:cluster_nodes(running)), | ||
lists:foldl(fun(Chunk, Acc) -> Acc ++ Chunk end, [], Chunks). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also make Chunks
your accumulator in lists:foldl/3
for both your listing functions (unless you were planning something else)
Conflicts: src/rabbit_control_main.erl
When the node is offline (`--offline` is set on the command line), we must skip the emission of the `node_deleted` event because the rabbit_event event handler is not setup (RabbitMQ is stopped). Otherwise, the command fails with `badarg` in gen_event. References #500. [#116521809]
The connection tracking tables are not replicated because the table tracking connections on node A logically exists only the node A. The backup made during the rename of a node failed because it wanted to access a remote offline node to backup its connection tracking tables. Obviously it didn't work. The solution is to not backup those tables. This is correct because they are only relevant while the node is running. References #500. [#116521809]
…lose This helps in CI where the connection tracking tables may not yet be up-to-date at the time the testcase verifies them. References #500. [#116521809]
While here, be more informative when a node rename fails.
References #500. [#116521809]
This makes sure the limits are still good after a node was renamed and thus, the tracking tables were recreated with a new name too. While here, do some cleanup in the init/end functions in both the two per-vhost limit testsuite. References #500. [#116521809]
This introduces (client) connection tracking that's aware of connection
and so on across the cluster. We already collect much of this information
in node-local state but that's not sufficient for #500, #607, and related features.
This changes internal database schema and thus can only go into
3.7.0
.This also makes sure that when a vhost is deleted, we force close all connections in it.
This is a 3rd iteration on the design. All connections and counters are tracked using
node-specific tables which are only updated on the node itself but read by all cluster nodes.
This avoids concurrent counter updates without going all the way to CRDTs.
The implementation isn't perfect but I run out of practical test cases to add. Points of improvement for the future:
rabbitmqctl status
andreport
entries related to limitsFixes #500, #627.
Lays ground work for #501, #607.
Depends on rabbitmq/rabbitmq-common#121.