Skip to content
This repository has been archived by the owner on Feb 11, 2020. It is now read-only.

Performance Issue in loading subscriptions [in qlobber.js via redis persistence] #428

Closed
behrad opened this issue Mar 3, 2016 · 78 comments

Comments

@behrad
Copy link
Contributor

behrad commented Mar 3, 2016

I have 67K subscribers with 5 topics subscribed each (around 335K subscriptions), when starting Mosca it takes minutes for it to start persistence on redis (to load subscriptions) and CPU is 100%

I traced the code and found this line https://github.com/mcollina/mosca/blob/master/lib/persistence/redis.js#L108 is producing all the trouble. commenting it lets mosca to start in few seconds iterating all the keys via keys ...* command.
So it seems qlobber trie implementation is not scalable to many hundred thousands of additions.

Asking @davedoesdev to think about performance improvements?

Change qlobber with may be another trie/qlobber impl? I found some but not dig into them yet:
https://github.com/isaacs/node-glob
https://github.com/jeresig/trie-js
https://github.com/rbranson/glob-trie.js?utm_source=javascriptweekly&utm_medium=email
http://ejohn.org/blog/javascript-trie-performance-analysis/

Writing a custom one?

What do you suggest @mcollina ?

@davedoesdev
Copy link
Contributor

@behrad qlobber has focused on matching performance - once the subscriptions are added, are you getting a problem?
That said, it seems we need to optimise adding patterns, I'll have a think. Any suggestions welcome.
I wonder what happens if you rate limit the subscriptions? Is there a dogpiling effect here?

@davedoesdev
Copy link
Contributor

@behrad you could profile/trace qlobber to see where it's spending it's time (start from https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L300). It could be the string split or some place in _add. If I get time I'll write a test to add that number of subscriptions (btw, how long are the topics?) but it may not be for a few days. If you've got the profiling set up already, it should be easy.

@davedoesdev
Copy link
Contributor

Might be interesting to write a quick test to compare string split performance against just iterating through the string with indexOf and substr.
Also, there may be something which is preventing _add being compiled.

@behrad
Copy link
Contributor Author

behrad commented Mar 3, 2016

qlobber has focused on matching performance - once the subscriptions are added, are you getting a problem?

Matching is great on my data (300K subs from 67K clients), I haven't read your trie code however I see there may be a tradeoff between insertion and super fast matching.

However in a real world case, there will be even more clients (millions) and multiple millions of subscriptions. Not tested matching in that scale yet... Currently high ratio insertion is bottleneck.

This problem shows up in two cases:

  1. when broker needs to be restarted in production environment. and it takes minutes for the broker to start up.

  2. when system grows to multi million clients, you will get a high rate of connect requests on the broker (may be around multi hundreds/sec for a million clients) and mobile clients (sleeps, mobile data lags,...) make it worst.

and client connects are online type of requests. Rate limiting seems not rational to me.

@behrad
Copy link
Contributor Author

behrad commented Mar 3, 2016

you could profile/trace qlobber to see where it's spending it's time (start from https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L300). It could be the string split or some place in _add. If I get time I'll write a test to add that number of subscriptions (btw, how long are the topics?) but it may not be for a few days. If you've got the profiling set up already, it should be easy

will let you know. Is there any profiling module on npm you prefer that I use on qlobber?

@mcollina
Copy link
Collaborator

mcollina commented Mar 3, 2016

0x, a new thing we are building at nearForm.
BTW, last year I fixed a deopt in the match method.
Il giorno gio 3 mar 2016 alle 20:57 Behrad [email protected] ha
scritto:

you could profile/trace qlobber to see where it's spending it's time
(start from
https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L300).
It could be the string split or some place in _add. If I get time I'll
write a test to add that number of subscriptions (btw, how long are the
topics?) but it may not be for a few days. If you've got the profiling set
up already, it should be easy

will let you know. Is there any profiling module on npm you prefer that I
use on qlobber?


Reply to this email directly or view it on GitHub
#428 (comment).

@behrad
Copy link
Contributor Author

behrad commented Mar 3, 2016

@mcollina How dya think about this problem? May it be an architecture flaw?
What if every instance didn't need all subs in memory? and it could be done in a clustered, sharded fashion, in between broker instances in the cluster?

It doesn't feel scalable this way around.

@behrad
Copy link
Contributor Author

behrad commented Mar 3, 2016

emqx/emqx#461 (comment)

@behrad
Copy link
Contributor Author

behrad commented Mar 3, 2016

how long are the topics?

They are in 5 levels: app/appName/user/userId/alert

@davedoesdev
Copy link
Contributor

0x looks cool - care to give it a try?

@davedoesdev
Copy link
Contributor

Yes, sharded/clustered may help but I think it's worth doing some profiling and checking for deopt.

@mcollina
Copy link
Collaborator

mcollina commented Mar 3, 2016

I'm actually fine in using a patrica trie vs a normal trie. It does not
change much, and it can improve Qlobber perf :).

Regarding the approach, it is scalable within a limit: without the
in-memory cache it would be massively slower.
Il giorno gio 3 mar 2016 alle 21:46 David Halls [email protected]
ha scritto:

Yes, sharded/clustered may help but I think it's worth doing some
profiling and checking for deopt.


Reply to this email directly or view it on GitHub
#428 (comment).

@behrad
Copy link
Contributor Author

behrad commented Mar 5, 2016

I changed async.each in https://github.com/mcollina/mosca/blob/master/lib/persistence/redis.js#L156 to async.eachSeries and traced the code, it is all about https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L123-L152 @davedoesdev
and it takes 12 minutes for my data 335K * 5(levels of topics) on a Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz server!

I'm not a V8 expert and don't know about tail-recursion optimizations in node.js, but when I commented all assignments in https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L123-L152 to let the code only recurse, it only took 30 Seconds!

@davedoesdev
Copy link
Contributor

@behrad thanks for doing that. I'm keen to profile this myself (0x will be useful). I'm going to do this here: davedoesdev/qlobber#8
However, I have a couple of other things to finish up first so it may be a few days.

@behrad
Copy link
Contributor Author

behrad commented Mar 5, 2016

Found effective line:
https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L144

only commenting st = sub_trie[word]; reduces 12 minutes to 30 seconds!!!
Can it be related to memory copy or argument passing to the later recursive all at the end the function? @davedoesdev

@behrad
Copy link
Contributor Author

behrad commented Mar 5, 2016

👍 @davedoesdev eager to this :)
hadn't heart about Ox, looks promising, thank you

@davedoesdev
Copy link
Contributor

That should just be a hash table lookup, weird.

@davedoesdev
Copy link
Contributor

Btw which version of Node are you using?

@davedoesdev
Copy link
Contributor

Also, what if you change https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L151 to

return this._add(val, i + 1, words, st);

?

@davedoesdev
Copy link
Contributor

I think TCO is coming in ES6. Not read the spec fully myself - may have to do something to support it? Also not sure that is the problem, just a guess.

@davedoesdev
Copy link
Contributor

According to this, TCO not yet in Node: https://kangax.github.io/compat-table/es6/

Am thinking if it will be difficult to write this iteratively.

@behrad
Copy link
Contributor Author

behrad commented Mar 5, 2016

On Node 4.3.1, will test with return added and let you know.

Am thinking if it will be difficult to write this iteratively.

That'd be great, I was thinking about asking you for an iterative version so that I can test against recursive one 👍

@behrad
Copy link
Contributor Author

behrad commented Mar 5, 2016

Also, what if you change https://github.com/davedoesdev/qlobber/blob/master/lib/qlobber.js#L151 to

return this._add(val, i + 1, words, st);
?

No, it took 14 minutes, but I think I got it.

Changing the line to return this._add(val, i + 1, words, {}); reduced it to 30seconds again, So as I guessed it is the argument passing to recursive calls while forth argument i.e. subtree is growing and getting a bigger object.
However since I don't know details about V8's argument passing, I can't explain why?
Is node passing parameters by value? :)

I believe an iterative version with an internal global object for the trie state, inside qlobber object, should enhance this much.

@davedoesdev
Copy link
Contributor

@behrad how strange! I'd assumed V8 would just pass a pointer to the object!
I'll have a think.

@davedoesdev
Copy link
Contributor

Btw- could you let me have your test script please?

@davedoesdev
Copy link
Contributor

(or maybe it's lazy and it doesn't do the hash table lookup if it thinks it isn't needed - could try just accessing st to see if that makes a difference).
Also, I think there's some way of getting logging as to what optimisations V8 is doing. @mcollina might know.

@psorowka
Copy link
Contributor

psorowka commented Mar 5, 2016

wouldn't it be feasible to wrap the recursion in an outer function and
accessing st via closure? The whole algorithm isn't async anyway, is it?

2016-03-05 13:29 GMT+01:00 David Halls [email protected]:

(or maybe it's lazy and it doesn't do the hash table lookup if it thinks
it isn't needed - could try just accessing st to see if that makes a
difference).
Also, I think there's some way of getting logging as to what optimisations
V8 is doing. @mcollina https://github.com/mcollina might know.


Reply to this email directly or view it on GitHub
#428 (comment).

@behrad
Copy link
Contributor Author

behrad commented Mar 5, 2016

Btw- could you let me have your test script please?

I have none, it is my application code in my stress testing environment.

You can easily reproduce this by subscribing 60K clientIds each subscribing to 5 topics with 5levels (app/test/user/behrad/testTopic) and qos=1 via mqttjs :) and then restarting mosca

@behrad
Copy link
Contributor Author

behrad commented Mar 5, 2016

And how much dya think reducing topic levels will help? I haven't tested with 3levels topic names, since my app is designed to work on a 5 level convention in production, can't change this that simple

@mcollina
Copy link
Collaborator

mcollina commented Mar 6, 2016

I've not followed this much, I will catch up in the next few days.

Unique insertion will be great. However allow to pass a function to test
for uniqueness.
Il giorno dom 6 mar 2016 alle 09:46 David Halls [email protected]
ha scritto:

Calling remove then add is a lot quicker (avoid matching). Would you
still like me to add an option to add unique insertion?


Reply to this email directly or view it on GitHub
#428 (comment).

@davedoesdev
Copy link
Contributor

@mcollina agreed

@davedoesdev
Copy link
Contributor

I've found an optimisation for match!
If you add these lines to the start of addAll then matching is greatly quicker if you've not got multiple matches to append:

  if (dest.length === 0) {
    return origin.slice(0);
  }

@behrad
Copy link
Contributor Author

behrad commented Mar 6, 2016

Unique insertion will be great. However allow to pass a function to test
for uniqueness.

👍

If you add these lines to the start of addAll then matching is greatly quicker if you've not got multiple matches to append:

Will test if it works...

@davedoesdev
Copy link
Contributor

The problem with unique insertion is how to do it without slowing down non-unique insertion.
Note also that duplicate results will still be possible from match even if non-unique insertion is implemented.

@davedoesdev
Copy link
Contributor

Regarding https://github.com/mcollina/mosca/blob/master/lib/persistence/redis.js#L107

Is this actually correct? What if there is a wildcard rule in that._subMatcher which matches? Then the rule won't be added.

@psorowka
Copy link
Contributor

psorowka commented Mar 6, 2016

wow i am getting curious whether all of this is related to #426 ..

2016-03-06 14:36 GMT+01:00 David Halls [email protected]:

Regarding
https://github.com/mcollina/mosca/blob/master/lib/persistence/redis.js#L107

Is this actually correct? What if there is a wildcard rule in
that._subMatcher which matches? Then the rule won't be added.


Reply to this email directly or view it on GitHub
#428 (comment).

@behrad
Copy link
Contributor Author

behrad commented Mar 6, 2016

Is this actually correct? What if there is a wildcard rule in that._subMatcher which matches? Then the rule won't be added.

This is Ok, since if multiple subscriptions of a clientId match, only a single message should be delivered I think :) @mcollina can approve this.

The problem with unique insertion is how to do it without slowing down non-unique insertion

Mosca's all usage is unique insertion I believe, However this can be configured as a qlobber constructor option.

@psorowka
Copy link
Contributor

psorowka commented Mar 6, 2016

Is this actually correct? What if there is a wildcard rule in that._subMatcher which matches? Then the rule won't be added.

This is Ok, since if multiple subscriptions of a clientId match, only a single message should be delivered I think :) @mcollina can approve this.

Agreed, this is expected behavior, but have a look at #426 this does not currently work with redis

@davedoesdev
Copy link
Contributor

@behrad but what if the wildcard subscription is removed? Remove and add won't be balanced.

Anyway, best way is to add uniqueness into qlobber. Ideally it would be able to use a list (as now) or a set on each leaf. ES6 has sets.

@behrad
Copy link
Contributor Author

behrad commented Mar 6, 2016

@behrad but what if the wildcard subscription is removed? Remove and add won't be balanced

yes, and remove/add won't work yes!

Anyway, best way is to add uniqueness into qlobber. Ideally it would be able to use a list (as now) or a set on each leaf. ES6 has sets.

Is it going to happen any soon? so that I can test against my current setup?

@behrad
Copy link
Contributor Author

behrad commented Mar 6, 2016

if (dest.length === 0) {
return origin.slice(0);
}

this reduce 78seconds to 17seconds for the above sample code

@davedoesdev
Copy link
Contributor

@behrad is that good enough?
The sets stuff is going to need some thinking/learning.

@davedoesdev
Copy link
Contributor

Have done some prep work on qlobber's master branch (not on npm). I've pulled out the list-specific handling into separate methods.
Hopefully soonish I'll be able to have a derived class from Qlobber which overrides these methods to use Set.

@davedoesdev
Copy link
Contributor

@behrad qlobber 0.6.0 is now published, which includes QlobberDedup: https://github.com/davedoesdev/qlobber#qlobberdedupoptions

API is the same except for match, which now returns an ES6 Set. Internally it uses Sets to dedup values too.

It'd be good to know if it helps your use case any. You'll have to modify mosca to use it.

@behrad
Copy link
Contributor Author

behrad commented Mar 9, 2016

I found currently Mosca is forwarding duplicate messages to client if overlapping topics are subscribed even now! It may be related to #428 (comment)

I think this may be a better Idea not to handle duplication on adding side, but at forwardPacket time when matching. Eager to here @mcollina view on this, both semantic & performance wise :)

@behrad
Copy link
Contributor Author

behrad commented Mar 9, 2016

@behrad is that good enough?

Was not good enough in mosca's case in my tests

It'd be good to know if it helps your use case any. You'll have to modify mosca to use it.

I'll give it a try 👍

@psorowka
Copy link
Contributor

psorowka commented Mar 9, 2016

@behrad Let's sort this:

  • exact duplicates of subscriptions must be checked when subscribing. Otherwise we'll have a lot of garbage when clients don't handle their clean session well
  • matching duplicates may occur in practice and should be deduped at forwarding. As far as I understood, mosca does this correctly in all other backends, but their seems to be a dedup bug with redis.

The latter is what #426 is about; I have not found time to dig into this deeper yet.

@behrad
Copy link
Contributor Author

behrad commented Mar 9, 2016

exact duplicates

what do you mean by exact? If got it right, it's more performant to do it as a simple stringish validation at very start of subscribe method. This is heavy to pass this to an internal huge data structure like Qlobbers trie

@psorowka
Copy link
Contributor

psorowka commented Mar 9, 2016

I mean:

some/specific/topic <-> some/specific/topic  # exact duplicate
some/specific/topic <-> some/+/topic      # matching duplicate             

@behrad
Copy link
Contributor Author

behrad commented Mar 9, 2016

right then it's more cheap to do it as a validation in subscribe before sending it down inside

@davedoesdev
Copy link
Contributor

Btw the benchmark results for QlobberDedup on the add-many are:

  ┌─────────┬────────────┬───────────────┬──────────┐
  │         │ total (ms) │  average (ns) │ diff (%) │
  ├─────────┼────────────┼───────────────┼──────────┤
  │ dedup   │        742 │   742,395,946 │        - │
  │ default │      6,299 │ 6,298,590,019 │      748 │
  └─────────┴────────────┴───────────────┴──────────┘

You may be able to get quicker by using a separate hashtable-based structure for checking added duplicates.

Pure matching is about the same as the normal (non-deduplicating) Qlobber.

@davedoesdev
Copy link
Contributor

@behrad what did you decide to do (if anything)?

@behrad
Copy link
Contributor Author

behrad commented Mar 24, 2016

Sorry for my late answer @davedoesdev Was our busy days and we are in Norouz holidays now :)
I should test QlobberDedup inside mosca and let you know, However I believe it should be used aside with other optimizations inside mosca as I mentioned above.

@behrad
Copy link
Contributor Author

behrad commented Apr 3, 2016

Finally I'm going to catch some time to spend on this guys.
I want to also hear your points on this http://250bpm.com/blog:19 @mcollina & @davedoesdev
Seems the same challenge ;)

@davedoesdev
Copy link
Contributor

Regarding http://250bpm.com/blog:19, qlobber already stores strings on nodes in the trie.

@mcollina
Copy link
Collaborator

Fixed in v2.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants