Outbound data architecture changes #680

derekcollison · 2018-06-11T22:36:23Z

The original outbound architecture would potentially spend time from a receiving client's Go routine to process inbound messages. This would do ok for small to medium fanout. This changes that such that it can switch from that model with a time bound architecture to a dedicated outbound Go routine. Routes will always default to using the dedicated outbound routine when ingesting messages. Clients may spend time in place to balance ingress vs egress rates and optimize for latency. There are also changes to help when routes connect with large number of subscriptions and to do a better job with slow consumer status. Various tests that were flapping were also fixed.

Resolves #675
Resolves #659
Resolves #550

/cc @nats-io/core

Signed-off-by: Derek Collison <[email protected]>

Use pending bytes as slow consumer trigger, so reintroduce max_pending. Improve latency with inplace flush calls when appropriate. Utilize simple time budget for readLoop routine. Signed-off-by: Derek Collison <[email protected]>

Signed-off-by: Derek Collison <[email protected]>

coveralls · 2018-06-11T22:47:38Z

Coverage increased (+0.3%) to 92.548% when pulling f7cb616 on fanout into e597043 on master.

kozlovic · 2018-06-12T16:31:46Z

server/client.go

+// Will return if data was attempted to be written.
+// Lock must be held
+func (c *client) flushOutbound() bool {
+	if c.flags.isSet(flushOutbound) {


I was wondering why use that since this needs to be called under client lock, but then I realized that this function release/reacquire the lock. This is quite dangerous for rest of code calling this function under the client lock. The state after calling flushOutbound() may have changed, which means that code would have to check and not assume state is as it was before the call to flushOutboud.

Yes correct but was a specific design goal to not hols a lock during IO similar to what read does now, but with outbound it can be called from multiple places hence the flag.

kozlovic · 2018-06-12T16:53:35Z

server/client.go

 	client.mu.Unlock()
+
+	// Remember for when we return to the top of the loop.
 	c.pcd[client] = needFlush


Is it safe outside the client lock? In sendOK() this is set under the client lock, and I see that you added a FIXME there. I think the idea of setting needsFlush there is because we call sendProto() with false, to indicate that this does not require a send in place, but ultimately, we want OK to be sent.

Your comment does not seem to match the highlighted code. But in the code, client and c are different.

Right. And any place we possibly touch this pcd map is from the client's readLoop go routine so we are ok.

kozlovic

Some comments.. did not fully review everything yet. There are some changes required I believe (locking client -> server is problematic for sure).

kozlovic · 2018-06-12T17:24:03Z

server/client.go

-			c.nc.SetWriteDeadline(time.Now().Add(c.srv.getOpts().WriteDeadline))
-			deadlineSet = true
+// queueOutbound queues data for client/route connections.
+// Return pending length.


Not really ;-)

Comment on the return or what it does. It queues data for the connection. flushOutbound is what may or may not send to the socket.

Was just the comment about returning pending length, which this function does not do (no return value).

kozlovic · 2018-06-12T17:24:34Z

server/client.go

+	// Snapshot opts
+	srv := c.srv
+
+	// Place primary on nb, assign primary to secondary, nil out nb and secondary.


assign secondary to primary

I read this as assign foo = 3, which when read assign primary to secondary is what happens.

Sorry for my english: I read foo = 3 as assign 3 to foo, not foo to 3, hence if you have p = s I would have said assign secondary to primary.

kozlovic · 2018-06-12T17:31:17Z

server/client.go

+	c.mu.Lock()
+
+	// Update flush time statistics
+	c.out.lft = lft


The function could return the flush time instead, since it is read only after calling this function in the readLoop.

I may use it for more advanced stats on /connz

kozlovic · 2018-06-12T17:34:39Z

server/client.go

+
+	// Re-acquire client lock
+	c.mu.Lock()
+


Should we check if connection was closed, and if so simply return true here? Not sure it makes sense to continue after this point if it has been closed. Also, we may end-up logging a flush error if the write returned an error due to socket close.

We check for err != nil right below here. If connection is properly closed the err should be nil, and the code below just does accounting. But we could check for that condition if you think its critical.

It is possible that connection is closed before we grab the lock but after the nb.WriteTo() so err would be nil. Again, as long as the remaining code is safe if the connection is closed I am fine.

kozlovic · 2018-06-12T17:38:27Z

server/client.go

+// flushSignal will use server to queue the flush IO operation to a pool of flushers.
+// Lock must be held.
+func (c *client) flushSignal() {
+	c.out.sg.Signal()


Should we have an out.inWait = true and out.inWait = false surrounding the call to sg.Wait() and flush only if inWait is true? That would possibly reduce the number of Signal() calls.

So far has not been a problem. Also, when Wait returns it will have the lock but it is not guaranteed the next instruction to reset the flag gets run.

kozlovic · 2018-06-12T17:41:47Z

server/client.go

+		// Check for a big message, and if found place directly on nb
+		// FIXME(dlc) - do we need signaling of ownership here if we want len(data) <
+		if len(data) > maxBufSize {
+			c.out.nb = append(c.out.nb, data)


What happens if the data is referenced here? What I mean is for instance when delivering a message, we queue the message header and then the payload. But the header is already referencing a client's buffer that is reused for each deliverMsg call. Is that safe?

Good observation. The code above is safe since anything over maxBufSize is a heap allocation for the message, hence the conditional. I want to make it more explicit such that I could scale out fan out cpu wise. Ran out of time but still on my list. I could add some more comments about how it is safe now.

kozlovic · 2018-06-12T17:46:23Z

server/route.go

+		}
+		if !didDeliver && c.srv != nil {
+			group := c.srv.lookupRemoteQGroup(string(c.pa.sid))
+			c.reRouteQMsg(r, msgh, msg, group)


I think that we should still account for group being not found and in this case simply return.

That is handled via a DebugF on route.go:159

kozlovic · 2018-06-12T17:47:02Z

server/route.go

+	s.rqsMu.RLock()
+	rqsub := s.rqsubs[sid]
+	s.rqsMu.RUnlock()
+	return rqsub.group


In theory, the group could not be found...

Its a struct in the map, so empty struct return so group would be default of nil.

kozlovic · 2018-06-12T17:56:38Z

server/client.go

+	// so we need to know what to do to avoid unnecessary message drops
+	// from [auto-]unsubscribe.
+	if c.typ == CLIENT && c.srv != nil &&
+		len(sub.queue) > 0 && c.srv.NumRoutes() > 0 {


That's a big no-no. NumRoutes() grabs the server lock. We can't do that. We have code doing s.mu.Lock() -> c.mu.Lock() so we should not do the opposite otherwise we have the risk of getting a lock inversion.
If you really want to optimize, it would be with an atomic (or immutable depending of config reload capabilities) on if routing is enabled or not. But if it is, I would not check the number of routes, simply store the q group.

Good catch, I will snapshot the variables and release client lock.

kozlovic · 2018-06-12T19:55:26Z

server/client.go

+// pubAllowed checks on publish permissioning.
+func (c *client) pubAllowed() bool {
+	// Disallow publish to _SYS.>, these are reserved for internals.
+	if c.pa.subject[0] == '_' && len(c.pa.subject) > 4 &&


I know that this is moved and not new code, but someone previously reported that bytes.Equal() would probably be as fast if not faster?

Used to be much slower but you are probably right. I could run a micro benchmark or just check the normal benchmarks. Will test quickly and if so will change.

Will need to be bytes.HasPrefix(). Running benchmarks now.

5ns slower with bytes.HasPrefix(), so will leave as is.

Ok, I was basing my comments on this.

but agreed, not worth the change.

I tweaked a bit and it made a small difference with string cast and subslice with ==.

Signed-off-by: Derek Collison <[email protected]>

kozlovic · 2018-06-12T20:53:35Z

LGTM

derekcollison and others added 26 commits June 4, 2018 17:45

Baseline order test and benchmarks

e6f200b

Changed sublist to avoid quadratic time in removal with large N

b9c73e9

Signed-off-by: Derek Collison <[email protected]>

Add fast slice for large psubs for Match

8502fb1

Signed-off-by: Derek Collison <[email protected]>

Change to timer setup

e1ce792

Signed-off-by: Derek Collison <[email protected]>

Re-enable benchmark tests

bb292d9

Signed-off-by: Derek Collison <[email protected]>

Collect pub permissions into own function

25654a4

Added large payload pub/sub benchmark

6443762

Signed-off-by: Derek Collison <[email protected]>

New outbound data architecture

481697e

Signed-off-by: Derek Collison <[email protected]>

Slow consumer updates and latency improvements.

50a9924

Use pending bytes as slow consumer trigger, so reintroduce max_pending. Improve latency with inplace flush calls when appropriate. Utilize simple time budget for readLoop routine. Signed-off-by: Derek Collison <[email protected]>

Add max_pending and write_deadline to varz

766ef3b

Signed-off-by: Derek Collison <[email protected]>

varz cluster empty when not defined

df574ce

Signed-off-by: Derek Collison <[email protected]>

Check flushOutbound and snapshot write_deadline and max_pending

e64ac42

Signed-off-by: Derek Collison <[email protected]>

Performance tweaks

e9178f1

Signed-off-by: Derek Collison <[email protected]>

Test dynamic buffers, track short reads/writes

30e31d5

Signed-off-by: Derek Collison <[email protected]>

Remove 1.8 support, trigger on 1.10

3bdab1b

Signed-off-by: Derek Collison <[email protected]>

Support for queue subscriber retries over routes

049db6e

Signed-off-by: Derek Collison <[email protected]>

Don't send route unsub with max

26dafe4

Signed-off-by: Derek Collison <[email protected]>

Fix data race

d3213df

Signed-off-by: Derek Collison <[email protected]>

Fixed bug reusing test sub

3e2e8c9

Signed-off-by: Derek Collison <[email protected]>

require 1.9 or above, bug fix in test

955d8ee

delivery last activity update

50bb4b9

new subs collector

cc07d50

Signed-off-by: Derek Collison <[email protected]>

lock users access

4dd4d2b

Signed-off-by: Derek Collison <[email protected]>

dynamic buffer updates

6299e03

Signed-off-by: Derek Collison <[email protected]>

Big message optimizations, slow consumer updates

d603c53

Signed-off-by: Derek Collison <[email protected]>

Performance optimizations, beta3, fixes to various tests.

844f376

Signed-off-by: Derek Collison <[email protected]>

derekcollison mentioned this pull request Jun 11, 2018

Defect: messages lost with non-zero max_msgs in 2-node cluster (bi-directional routes) #632

Closed

kozlovic reviewed Jun 12, 2018

View reviewed changes

kozlovic requested changes Jun 12, 2018

View reviewed changes

kozlovic reviewed Jun 12, 2018

View reviewed changes

derekcollison added 2 commits June 12, 2018 12:55

Avoid lock to server with client lock held

4fb84e2

Signed-off-by: Derek Collison <[email protected]>

Optimization per @cdevienne

f7cb616

Signed-off-by: Derek Collison <[email protected]>

kozlovic approved these changes Jun 12, 2018

View reviewed changes

derekcollison merged commit 5598d5c into master Jun 12, 2018

derekcollison deleted the fanout branch June 12, 2018 20:56

Outbound data architecture changes #680

Outbound data architecture changes #680

Conversation

derekcollison commented Jun 11, 2018

coveralls commented Jun 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kozlovic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kozlovic Jun 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekcollison Jun 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kozlovic commented Jun 12, 2018

coveralls commented Jun 11, 2018 •

edited

Loading

kozlovic Jun 12, 2018 •

edited

Loading

derekcollison Jun 12, 2018 •

edited

Loading