Performance Issue in loading subscriptions [in qlobber.js via redis persistence] #428

behrad · 2016-03-03T16:29:05Z

I have 67K subscribers with 5 topics subscribed each (around 335K subscriptions), when starting Mosca it takes minutes for it to start persistence on redis (to load subscriptions) and CPU is 100%

I traced the code and found this line https://github.com/mcollina/mosca/blob/master/lib/persistence/redis.js#L108 is producing all the trouble. commenting it lets mosca to start in few seconds iterating all the keys via keys ...* command.
So it seems qlobber trie implementation is not scalable to many hundred thousands of additions.

Asking @davedoesdev to think about performance improvements?

Change qlobber with may be another trie/qlobber impl? I found some but not dig into them yet:
https://github.com/isaacs/node-glob
https://github.com/jeresig/trie-js
https://github.com/rbranson/glob-trie.js?utm_source=javascriptweekly&utm_medium=email
http://ejohn.org/blog/javascript-trie-performance-analysis/

Writing a custom one?

What do you suggest @mcollina ?

The text was updated successfully, but these errors were encountered:

davedoesdev · 2016-03-03T19:29:24Z

@behrad qlobber has focused on matching performance - once the subscriptions are added, are you getting a problem?
That said, it seems we need to optimise adding patterns, I'll have a think. Any suggestions welcome.
I wonder what happens if you rate limit the subscriptions? Is there a dogpiling effect here?

davedoesdev · 2016-03-03T19:37:11Z

@behrad you could profile/trace qlobber to see where it's spending it's time (start from https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L300). It could be the string split or some place in _add. If I get time I'll write a test to add that number of subscriptions (btw, how long are the topics?) but it may not be for a few days. If you've got the profiling set up already, it should be easy.

davedoesdev · 2016-03-03T19:40:47Z

Might be interesting to write a quick test to compare string split performance against just iterating through the string with indexOf and substr.
Also, there may be something which is preventing _add being compiled.

behrad · 2016-03-03T19:55:31Z

qlobber has focused on matching performance - once the subscriptions are added, are you getting a problem?

Matching is great on my data (300K subs from 67K clients), I haven't read your trie code however I see there may be a tradeoff between insertion and super fast matching.

However in a real world case, there will be even more clients (millions) and multiple millions of subscriptions. Not tested matching in that scale yet... Currently high ratio insertion is bottleneck.

This problem shows up in two cases:

when broker needs to be restarted in production environment. and it takes minutes for the broker to start up.
when system grows to multi million clients, you will get a high rate of connect requests on the broker (may be around multi hundreds/sec for a million clients) and mobile clients (sleeps, mobile data lags,...) make it worst.

and client connects are online type of requests. Rate limiting seems not rational to me.

behrad · 2016-03-03T19:57:06Z

you could profile/trace qlobber to see where it's spending it's time (start from https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L300). It could be the string split or some place in _add. If I get time I'll write a test to add that number of subscriptions (btw, how long are the topics?) but it may not be for a few days. If you've got the profiling set up already, it should be easy

will let you know. Is there any profiling module on npm you prefer that I use on qlobber?

mcollina · 2016-03-03T20:17:54Z

0x, a new thing we are building at nearForm.
BTW, last year I fixed a deopt in the match method.
Il giorno gio 3 mar 2016 alle 20:57 Behrad [email protected] ha
scritto:

you could profile/trace qlobber to see where it's spending it's time
(start from
https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L300).
It could be the string split or some place in _add. If I get time I'll
write a test to add that number of subscriptions (btw, how long are the
topics?) but it may not be for a few days. If you've got the profiling set
up already, it should be easy

will let you know. Is there any profiling module on npm you prefer that I
use on qlobber?

—
Reply to this email directly or view it on GitHub
#428 (comment).

behrad · 2016-03-03T20:21:17Z

@mcollina How dya think about this problem? May it be an architecture flaw?
What if every instance didn't need all subs in memory? and it could be done in a clustered, sharded fashion, in between broker instances in the cluster?

It doesn't feel scalable this way around.

behrad · 2016-03-03T20:35:34Z

emqx/emqx#461 (comment)

behrad · 2016-03-03T20:42:25Z

how long are the topics?

They are in 5 levels: app/appName/user/userId/alert

davedoesdev · 2016-03-03T20:45:30Z

0x looks cool - care to give it a try?

davedoesdev · 2016-03-03T20:46:44Z

Yes, sharded/clustered may help but I think it's worth doing some profiling and checking for deopt.

mcollina · 2016-03-03T20:58:07Z

I'm actually fine in using a patrica trie vs a normal trie. It does not
change much, and it can improve Qlobber perf :).

Regarding the approach, it is scalable within a limit: without the
in-memory cache it would be massively slower.
Il giorno gio 3 mar 2016 alle 21:46 David Halls [email protected]
ha scritto:

Yes, sharded/clustered may help but I think it's worth doing some
profiling and checking for deopt.

—
Reply to this email directly or view it on GitHub
#428 (comment).

behrad · 2016-03-05T08:41:28Z

I changed async.each in https://github.com/mcollina/mosca/blob/master/lib/persistence/redis.js#L156 to async.eachSeries and traced the code, it is all about https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L123-L152 @davedoesdev
and it takes 12 minutes for my data 335K * 5(levels of topics) on a Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz server!

I'm not a V8 expert and don't know about tail-recursion optimizations in node.js, but when I commented all assignments in https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L123-L152 to let the code only recurse, it only took 30 Seconds!

davedoesdev · 2016-03-05T08:46:25Z

@behrad thanks for doing that. I'm keen to profile this myself (0x will be useful). I'm going to do this here: davedoesdev/qlobber#8
However, I have a couple of other things to finish up first so it may be a few days.

behrad · 2016-03-05T08:48:43Z

Found effective line:
https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L144

only commenting st = sub_trie[word]; reduces 12 minutes to 30 seconds!!!
Can it be related to memory copy or argument passing to the later recursive all at the end the function? @davedoesdev

behrad · 2016-03-05T08:50:48Z

👍 @davedoesdev eager to this :)
hadn't heart about Ox, looks promising, thank you

davedoesdev · 2016-03-05T08:55:15Z

That should just be a hash table lookup, weird.

davedoesdev · 2016-03-05T08:57:59Z

Btw which version of Node are you using?

davedoesdev · 2016-03-05T09:00:21Z

Also, what if you change https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L151 to

return this._add(val, i + 1, words, st);

?

davedoesdev · 2016-03-05T09:04:38Z

I think TCO is coming in ES6. Not read the spec fully myself - may have to do something to support it? Also not sure that is the problem, just a guess.

davedoesdev · 2016-03-05T09:12:46Z

According to this, TCO not yet in Node: https://kangax.github.io/compat-table/es6/

Am thinking if it will be difficult to write this iteratively.

behrad · 2016-03-05T09:46:22Z

On Node 4.3.1, will test with return added and let you know.

Am thinking if it will be difficult to write this iteratively.

That'd be great, I was thinking about asking you for an iterative version so that I can test against recursive one 👍

behrad · 2016-03-05T11:18:31Z

Also, what if you change https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L151 to

return this._add(val, i + 1, words, st);
?

No, it took 14 minutes, but I think I got it.

Changing the line to return this._add(val, i + 1, words, {}); reduced it to 30seconds again, So as I guessed it is the argument passing to recursive calls while forth argument i.e. subtree is growing and getting a bigger object.
However since I don't know details about V8's argument passing, I can't explain why?
Is node passing parameters by value? :)

I believe an iterative version with an internal global object for the trie state, inside qlobber object, should enhance this much.

davedoesdev · 2016-03-05T12:24:01Z

@behrad how strange! I'd assumed V8 would just pass a pointer to the object!
I'll have a think.

davedoesdev · 2016-03-05T12:24:19Z

Btw- could you let me have your test script please?

davedoesdev · 2016-03-05T12:29:05Z

(or maybe it's lazy and it doesn't do the hash table lookup if it thinks it isn't needed - could try just accessing st to see if that makes a difference).
Also, I think there's some way of getting logging as to what optimisations V8 is doing. @mcollina might know.

psorowka · 2016-03-05T12:33:30Z

wouldn't it be feasible to wrap the recursion in an outer function and
accessing st via closure? The whole algorithm isn't async anyway, is it?

2016-03-05 13:29 GMT+01:00 David Halls [email protected]:

(or maybe it's lazy and it doesn't do the hash table lookup if it thinks
it isn't needed - could try just accessing st to see if that makes a
difference).
Also, I think there's some way of getting logging as to what optimisations
V8 is doing. @mcollina https://github.com/mcollina might know.

—
Reply to this email directly or view it on GitHub
#428 (comment).

behrad · 2016-03-05T13:11:22Z

Btw- could you let me have your test script please?

I have none, it is my application code in my stress testing environment.

You can easily reproduce this by subscribing 60K clientIds each subscribing to 5 topics with 5levels (app/test/user/behrad/testTopic) and qos=1 via mqttjs :) and then restarting mosca

behrad · 2016-03-05T13:13:01Z

And how much dya think reducing topic levels will help? I haven't tested with 3levels topic names, since my app is designed to work on a 5 level convention in production, can't change this that simple

mcollina · 2016-03-06T08:53:23Z

I've not followed this much, I will catch up in the next few days.

Unique insertion will be great. However allow to pass a function to test
for uniqueness.
Il giorno dom 6 mar 2016 alle 09:46 David Halls [email protected]
ha scritto:

Calling remove then add is a lot quicker (avoid matching). Would you
still like me to add an option to add unique insertion?

—
Reply to this email directly or view it on GitHub
#428 (comment).

davedoesdev · 2016-03-06T08:54:35Z

@mcollina agreed

davedoesdev · 2016-03-06T08:55:48Z

I've found an optimisation for match!
If you add these lines to the start of addAll then matching is greatly quicker if you've not got multiple matches to append:

  if (dest.length === 0) {
    return origin.slice(0);
  }

behrad · 2016-03-06T11:25:08Z

Unique insertion will be great. However allow to pass a function to test
for uniqueness.

👍

If you add these lines to the start of addAll then matching is greatly quicker if you've not got multiple matches to append:

Will test if it works...

davedoesdev · 2016-03-06T13:33:54Z

The problem with unique insertion is how to do it without slowing down non-unique insertion.
Note also that duplicate results will still be possible from match even if non-unique insertion is implemented.

davedoesdev · 2016-03-06T13:36:08Z

Regarding https://github.com/mcollina/mosca/blob/master/lib/persistence/redis.js#L107

Is this actually correct? What if there is a wildcard rule in that._subMatcher which matches? Then the rule won't be added.

psorowka · 2016-03-06T13:44:57Z

wow i am getting curious whether all of this is related to #426 ..

2016-03-06 14:36 GMT+01:00 David Halls [email protected]:

Regarding
https://github.com/mcollina/mosca/blob/master/lib/persistence/redis.js#L107

Is this actually correct? What if there is a wildcard rule in
that._subMatcher which matches? Then the rule won't be added.

—
Reply to this email directly or view it on GitHub
#428 (comment).

behrad · 2016-03-06T16:27:21Z

Is this actually correct? What if there is a wildcard rule in that._subMatcher which matches? Then the rule won't be added.

This is Ok, since if multiple subscriptions of a clientId match, only a single message should be delivered I think :) @mcollina can approve this.

The problem with unique insertion is how to do it without slowing down non-unique insertion

Mosca's all usage is unique insertion I believe, However this can be configured as a qlobber constructor option.

psorowka · 2016-03-06T16:28:54Z

Is this actually correct? What if there is a wildcard rule in that._subMatcher which matches? Then the rule won't be added.

This is Ok, since if multiple subscriptions of a clientId match, only a single message should be delivered I think :) @mcollina can approve this.

Agreed, this is expected behavior, but have a look at #426 this does not currently work with redis

davedoesdev · 2016-03-06T16:43:35Z

@behrad but what if the wildcard subscription is removed? Remove and add won't be balanced.

Anyway, best way is to add uniqueness into qlobber. Ideally it would be able to use a list (as now) or a set on each leaf. ES6 has sets.

behrad · 2016-03-06T16:47:33Z

@behrad but what if the wildcard subscription is removed? Remove and add won't be balanced

yes, and remove/add won't work yes!

Anyway, best way is to add uniqueness into qlobber. Ideally it would be able to use a list (as now) or a set on each leaf. ES6 has sets.

Is it going to happen any soon? so that I can test against my current setup?

behrad · 2016-03-06T17:16:07Z

if (dest.length === 0) {
return origin.slice(0);
}

this reduce 78seconds to 17seconds for the above sample code

davedoesdev · 2016-03-06T21:22:55Z

@behrad is that good enough?
The sets stuff is going to need some thinking/learning.

davedoesdev · 2016-03-06T23:22:02Z

Have done some prep work on qlobber's master branch (not on npm). I've pulled out the list-specific handling into separate methods.
Hopefully soonish I'll be able to have a derived class from Qlobber which overrides these methods to use Set.

davedoesdev · 2016-03-08T21:50:02Z

@behrad qlobber 0.6.0 is now published, which includes QlobberDedup: https://github.com/davedoesdev/qlobber#qlobberdedupoptions

API is the same except for match, which now returns an ES6 Set. Internally it uses Sets to dedup values too.

It'd be good to know if it helps your use case any. You'll have to modify mosca to use it.

behrad · 2016-03-09T19:46:52Z

I found currently Mosca is forwarding duplicate messages to client if overlapping topics are subscribed even now! It may be related to #428 (comment)

I think this may be a better Idea not to handle duplication on adding side, but at forwardPacket time when matching. Eager to here @mcollina view on this, both semantic & performance wise :)

behrad · 2016-03-09T19:49:45Z

@behrad is that good enough?

Was not good enough in mosca's case in my tests

It'd be good to know if it helps your use case any. You'll have to modify mosca to use it.

I'll give it a try 👍

psorowka · 2016-03-09T20:28:25Z

@behrad Let's sort this:

exact duplicates of subscriptions must be checked when subscribing. Otherwise we'll have a lot of garbage when clients don't handle their clean session well
matching duplicates may occur in practice and should be deduped at forwarding. As far as I understood, mosca does this correctly in all other backends, but their seems to be a dedup bug with redis.

The latter is what #426 is about; I have not found time to dig into this deeper yet.

behrad · 2016-03-09T20:36:12Z

exact duplicates

what do you mean by exact? If got it right, it's more performant to do it as a simple stringish validation at very start of subscribe method. This is heavy to pass this to an internal huge data structure like Qlobbers trie

psorowka · 2016-03-09T20:39:07Z

I mean:

some/specific/topic <-> some/specific/topic  # exact duplicate
some/specific/topic <-> some/+/topic      # matching duplicate

behrad · 2016-03-09T20:40:25Z

right then it's more cheap to do it as a validation in subscribe before sending it down inside

davedoesdev · 2016-03-09T20:52:23Z

Btw the benchmark results for QlobberDedup on the add-many are:

  ┌─────────┬────────────┬───────────────┬──────────┐
  │         │ total (ms) │  average (ns) │ diff (%) │
  ├─────────┼────────────┼───────────────┼──────────┤
  │ dedup   │        742 │   742,395,946 │        - │
  │ default │      6,299 │ 6,298,590,019 │      748 │
  └─────────┴────────────┴───────────────┴──────────┘

You may be able to get quicker by using a separate hashtable-based structure for checking added duplicates.

Pure matching is about the same as the normal (non-deduplicating) Qlobber.

davedoesdev · 2016-03-24T08:21:48Z

@behrad what did you decide to do (if anything)?

behrad · 2016-03-24T14:11:00Z

Sorry for my late answer @davedoesdev Was our busy days and we are in Norouz holidays now :)
I should test QlobberDedup inside mosca and let you know, However I believe it should be used aside with other optimizations inside mosca as I mentioned above.

behrad · 2016-04-03T16:22:33Z

Finally I'm going to catch some time to spend on this guys.
I want to also hear your points on this http://250bpm.com/blog:19 @mcollina & @davedoesdev
Seems the same challenge ;)

davedoesdev · 2016-04-28T06:41:51Z

Regarding http://250bpm.com/blog:19, qlobber already stores strings on nodes in the trie.

mcollina · 2016-06-18T15:23:59Z

Fixed in v2.

davedoesdev mentioned this issue Mar 3, 2016

Profiling davedoesdev/qlobber#8

Closed

behrad mentioned this issue Apr 3, 2016

Cluster-aware Redis backend #441

Closed

behrad mentioned this issue Apr 14, 2016

Switching to Qlobber.Dedup #449

Closed

mcollina closed this as completed Jun 18, 2016

Performance Issue in loading subscriptions [in qlobber.js via redis persistence] #428

Performance Issue in loading subscriptions [in qlobber.js via redis persistence] #428

Comments

behrad commented Mar 3, 2016

davedoesdev commented Mar 3, 2016

davedoesdev commented Mar 3, 2016

davedoesdev commented Mar 3, 2016

behrad commented Mar 3, 2016

behrad commented Mar 3, 2016

mcollina commented Mar 3, 2016

behrad commented Mar 3, 2016

behrad commented Mar 3, 2016

behrad commented Mar 3, 2016

davedoesdev commented Mar 3, 2016

davedoesdev commented Mar 3, 2016

mcollina commented Mar 3, 2016

behrad commented Mar 5, 2016

davedoesdev commented Mar 5, 2016

behrad commented Mar 5, 2016

behrad commented Mar 5, 2016

davedoesdev commented Mar 5, 2016

davedoesdev commented Mar 5, 2016

davedoesdev commented Mar 5, 2016

davedoesdev commented Mar 5, 2016

davedoesdev commented Mar 5, 2016

behrad commented Mar 5, 2016

behrad commented Mar 5, 2016

davedoesdev commented Mar 5, 2016

davedoesdev commented Mar 5, 2016

davedoesdev commented Mar 5, 2016

psorowka commented Mar 5, 2016

behrad commented Mar 5, 2016

behrad commented Mar 5, 2016

mcollina commented Mar 6, 2016

davedoesdev commented Mar 6, 2016

davedoesdev commented Mar 6, 2016

behrad commented Mar 6, 2016

davedoesdev commented Mar 6, 2016

davedoesdev commented Mar 6, 2016

psorowka commented Mar 6, 2016

behrad commented Mar 6, 2016

psorowka commented Mar 6, 2016

davedoesdev commented Mar 6, 2016

behrad commented Mar 6, 2016

behrad commented Mar 6, 2016

davedoesdev commented Mar 6, 2016

davedoesdev commented Mar 6, 2016

davedoesdev commented Mar 8, 2016

behrad commented Mar 9, 2016

behrad commented Mar 9, 2016

psorowka commented Mar 9, 2016

behrad commented Mar 9, 2016

psorowka commented Mar 9, 2016

behrad commented Mar 9, 2016

davedoesdev commented Mar 9, 2016

davedoesdev commented Mar 24, 2016

behrad commented Mar 24, 2016

behrad commented Apr 3, 2016

davedoesdev commented Apr 28, 2016

mcollina commented Jun 18, 2016