-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added .group() function #153
base: master
Are you sure you want to change the base?
Conversation
I haven't tested this much, but first I tried to use it like: My collection has two objects, both have firstName field. What am I doing wrong? Can this be used without the reduce, finalize, etc? |
-----BEGIN PGP SIGNED MESSAGE----- You need to invoke col.find(...).group(...) rather than col.group(). It is a slight departure from the mongo invocation, but I felt this made more sense in the nedb context, keeping things in line with the other resultset post-processors like sort, skip, etc. Also, this way one doesnt need to support the 'cond' attribute, since find() takes care of filtering already. In this implementation, grouping is only possible on the result of the find() call, and so must be invoked on the cursor (hence you should omit passing a callback to find). finalize() is optional, but a reduce() is necessary - in a way it is the whole point of the group function :-). On March 10, 2014 1:57:20 AM GMT+05:30, Ivo Georgiev [email protected] wrote:
Sent from my Android device with K-9 Mail. Please excuse my brevity. iQFgBAEBCABKBQJTHNk/QxxBZGl0eWEgTXVraG9wYWRoeWF5ICgyMDQ4IGJpdCBS Powered by BigRock.com |
Sorry, I did invoke .find() - just forgot to write it. Otherwise it would have thrown an error (.group would be undefined for the collection since it's defined in the cursor). However, in my case there was no error - the callback was simply not invoked. This behavior is not ideal - in case reduce is necessary, there should be an error thrown if it's not passed. |
It seems to work when I added both initial: and reduce: I'm sorry, the error is being called on the callback. My mistake, ignore the initial comment. |
Hmmm, it would've been really strange if it hadn't thrown an error for the missing input - the validation code to check for valid keys, reduce and initial is present. |
Thanks for the PR, this seems very interesting. Will try to review and merge in the coming days. |
Added a commit that allows the caller to omit specifying the key. This will result in the reducer being applied to the entire find() result, followed by finalize(), if defined. |
Hi, It's a very good functionality, thank you @adityamukho. I've merge this on my fork (with the new sort function #159) and it's work correctly for the moment. The thing that stuck me was that the array of values are not accepted, even by manipulating the data in the reduce function. Is this normal? |
Thanks for the feedback @Azema In any case, can you share the dataset and operators you used in your query? It may be useful to take a closer look. |
Hi @adityamukho, Here is the information you asked. I tried to delete the condition that checks that the value indicated by the key points is an array and in my case it worked, because the array was concatenated. But I think it would not work in all cases. A example of data used: { "title": "A", "genre": ["a","b"] }
{ "title": "B", "genre": ["c"] }
{ "title": "C", "genre": ["d","e"] }
{ "title": "D", "genre": [] } My query: var sort = {genre: 1};
db.find({})
.group({
key: {'genre': 1},
reduce: function(curr, result) {
if (curr.genre instanceof Array && curr.genre.length > 0) {
result.genre = curr.genre[0];
}
return result;
},
initial: {}
}).sort({genre: 1}).exec(function(err, genres) {
res.json({results: genres});
}); The results of the query: [
{ "genre": "a" }
] Here are the expected results: [
{ "genre": "a" },
{ "genre": "c" },
{ "genre": "d" }
] |
I'm looking into enabling array values in key fields, but in the meantime I think the expected result should be.. [
{ "genre": "a" },
{ "genre": "c" },
{ "genre": "d" },
{ "genre": [] }
] ..since the actual keys: |
Hi @Azema I'm converting non-primitive keys to a fixed length string hash during the intermediate collection stage. This is less than ideal since the number of possible non-primitive keys is greater than the number of possible fixed length hashes. However, in situations where such a hash overlap does not occur, it does cover the use case you have given. I would refrain from merging this into the master branch until I or someone else can come up with a better way to uniquely and efficiently identify (possibly large) non-primitive objects (using a variable length hash perhaps). |
hi @adityamukho, Thank you for your efforts and I will test your changes. I will come back to you if I find a problem. I think your idea is good hash, but it may increase the processing time. However, I do not have better things to offer for now. |
Have updated the code to use a variable length hash. It uses the djb2 algorithm, which may not be quite as fast as the previous xxhash, but ensures uniqueness, and is still much faster than any crypto hash or 2-way hash. The included implementation, provided by the es-hash module has the added advantage of being browser-compatible. The default behaviour would be to hash all keys during the collection stage, ensuring correct output in all cases, at the cost of a slight performance penalty. The hashing can be disabled by passing https://github.com/adityamukho/nedb/tree/feature-array-keys |
Another small validation - I just finished writing an NeDB ORM adapter for the Sails framework (https://github.com/adityamukho/sails-nedb), based on this fork. The Waterline (Sails' persistence layer) tests are fairly rigorous and extensive, and so far all tests have passed, including the aggregation tests. |
This would be a great feature to have. Is it still planned to merge this? |
Haven't looked into this PR in a while. Looks like some files have changed in the master repo while I was gone, which would now require a manual conflict resolution. Also, I have to push myself to write some test cases. But other than these two points, I think this PR is still ripe for a merge. I have used this function extensively in several applications and have not seen any errors or incorrect behavior so far. |
@adityamukho congrats, it's really useful -- I hope it'll be merged some day! |
Hi, @adityamukho! |
Any progress on this? It is rather worrying that a pull request like this does not get merged and merge conflicts start to pile up. Makes it tough for contributors to consider contributing. Looks like other users are forking nedb because this original repo does not contain the functionality they want. |
+1 This should be merged... |
Any news on merging this? |
👍 |
Urgh - open for 9 months now :-/ |
Unfortunately not, no time to do it in the coming months I expect ... |
@louischatriot will you consider adding a contributor to this project? (PR's which take > 9 months seems quite extreme). |
Should this project be considered dead? (NedB) ? |
No, it still works well. Features are not being added regularly, but that 2016-10-24 16:01 GMT+02:00 Freddy [email protected]:
|
This commit is a cleaned version of original PR.
Just made a clean commit (631bad4) on nedb3 |
Ref: http://docs.mongodb.org/manual/reference/method/db.collection.group/
Added a function to the
Cursor
protoype that allows users to reduce the result set (afterfind()
), allowing operations like aggregation (sum, avg, count, etc) or any other reductor. This function can work in conjunction withlimit()
,skip()
andsort()
. The order of the execution pipeline is:find -> group -> sort -> slice (skip+limit)
.The function is invoked on the
Cursor
object returned byfind()
, with an object that looks like:Unlike the
group()
function in MongoDB, this does not need acond
part, since thefind()
operation takes care of that beforehand.The function returns the
Cursor
on which it was invoked, allowing for chaining sort, skip/limit ops later.The reduction outputs a resultset that looks like:
The field names in the groupBy keys have their
.
replaced by_
so that the subsequent sort function, if present, works properly.I have tried my best to ensure the function is optimized, and robust. The test suite passes all tests. Have tested this function on a DB with ~380,000 records and the same reductor as shown above, and it works fine. Haven't added test cases directly to the project yet, though. Will do it if this looks likely to be merged :).
PS. The browser versions were auto-generated using the build script. I haven't modified them manually. The test page shows no errors there either.