Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds geo_centroid metric aggregator #13846

Merged
merged 2 commits into from
Oct 14, 2015
Merged

Conversation

nknize
Copy link
Contributor

@nknize nknize commented Sep 29, 2015

This PR adds a new metric aggregator for computing the geo_centroid over a set of geo_point fields. This can be combined with other aggregators (e.g., geohash_grid, significant_terms) for computing the geospatial centroid based on the document sets from other aggregation results.

closes #13621

@nknize nknize added review :Analytics/Geo Indexing, search aggregations of geo points and shapes :Analytics/Aggregations Aggregations v2.1.0 v5.0.0-alpha1 >feature labels Sep 29, 2015
@clintongormley
Copy link
Contributor

Hiya @nknize

I note this PR is missing documentation? Also, is the plan to remove the geohash_grid centroid added in #13433 in favour of this agg? (I may have missed it in this PR).

@nknize
Copy link
Contributor Author

nknize commented Oct 2, 2015

@clintongormley Thanks for the documentation reminder. I'll finish that up.

re: removal of #13433 I was thinking we make it optional (e.g., weighted_centroid: true) and default to false. This way, we can avoid requiring a subaggregatoin if the user only wants the centroid of the geo_grid. There's less overhead for this more common use case.

pt[0] = pt[0] + (value.getLon() - pt[0]) / ++totalCounts;
pt[1] = pt[1] + (value.getLat() - pt[1]) / totalCounts;
}
centroids.set(bucket, XGeoUtils.mortonHash(pt[0], pt[1]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we not update the counts array with the new value of totalCounts here too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it weird that we serialise using XGeoUtils.mortonHash() but de-serialise using GeoUtils.fromIndexLong(). Could we change it so we serialise and de-serialise using the same object? That way it will be easier to see that changing the serialise method will affect the deserialise method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colings86 comments would have been useful eh? ¯_(ツ)_/¯ updating counts is done on line 90. I'll go ahead and rename some variables and add comments.

Re: serialization - we serialize using XGeoUtils.mortonHash() to obtain the encoded long, but then we deserialize with GeoPoint.fromIndexLong() which sets lat lon values for the GeoPoint object using the XGeoUtils.mortonUnhash... counterpart.

@colings86
Copy link
Contributor

@nknize I left some comments but still want to try this out on some data too.

In terms of this replacing the weighted centroid in the geohash-grid agg, I am a bit torn. On the one hand, I agree its is going to be a common use-case, but on the other hand I don't like having the same implementation in two different places as it adds a maintenance overhead.

@nknize
Copy link
Contributor Author

nknize commented Oct 5, 2015

@colings86 Thanks for the feedback. I don't like having it in both places either. I opened #13912 to facilitate a discussion for whether we want to keep it "native" but make it optional in geohash_grid. I like this idea because it gives the best performance for the common-use case. If we decide to go forward with that approach I'll decouple the weighted average logic so its not duplicated.

@nknize
Copy link
Contributor Author

nknize commented Oct 9, 2015

@colings86 removed centroid calculation from GeoHashGridAggregation. Centroid is a standalone metric aggregator. /cc @jpountz

@colings86
Copy link
Contributor

LGTM

This commit adds a new metric aggregator for computing the geo_centroid over a set of geo_point fields. This can be combined with other aggregators (e.g., geohash_grid, significant_terms) for computing the geospatial centroid based on the document sets from other aggregation results.
@nknize nknize merged commit ceefe2e into elastic:master Oct 14, 2015
@s1monw
Copy link
Contributor

s1monw commented Oct 15, 2015

this PR has been merged into master and cherry-picked into 2.1 but not int 2.x @nknize can you please make sure you are cherry-picking it into 2.x as well. also don't forget cherry-picking the serialization fix here: 5b1ee8b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations :Analytics/Geo Indexing, search aggregations of geo points and shapes >feature v2.1.0 v5.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add geo_centroid metric aggregation
4 participants