Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Improve null handling in Geo Functions #40708

Merged
merged 8 commits into from
Apr 10, 2019

Conversation

imotov
Copy link
Contributor

@imotov imotov commented Apr 1, 2019

ST_Distance function returns null now instead of throwing an error
when one of the arguments in null and cleans up null handling
logic in the field extractor.

@bpintea, could you take a look to see if that would work better for you in terms of null handling?

ST_Distance function returns null now instead of throwing an error
when one of the arguments in null.
@imotov imotov added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes :Analytics/SQL SQL querying v8.0.0 labels Apr 1, 2019
@imotov imotov requested review from costin, astefan, bpintea and matriv April 1, 2019 21:44
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also add some unit test in FieldHitExtractorTests ?

@@ -143,8 +148,12 @@ private Object unwrapMultiValue(Object values) {
}
if (dataType == DataType.GEO_SHAPE) {
Object value;
if (values instanceof List && ((List<?>) values).size() == 1) {
value = ((List<?>) values).get(0);
if (values instanceof List) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please extract it to a method as it seems duplicate with the GEO_POINT method?

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a try and it LGTM.

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment about arrays of geo shapes, if my assumption about the code is correct.

private Object extractGeoShape(Object values) {
Object value;
if (values instanceof List) {
if (((List<?>) values).size() > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the correct approach here. If I'm not mistaken you are extracting here the first value from a field that has an array of geo shapes. SQL has added recently a "multi field leniency" attribute to both REST and drivers support: https://www.elastic.co/guide/en/elasticsearch/reference/6.7/sql-rest.html#sql-rest-fields.
Code-wise, this leniency is handled here and here at extraction time. I think you need to include extractGeoShape() in the array leniency logic probably only in unwrapMultiValue method.

Also, would be good to see some tests with arrays of geo shapes and not only in one-level fields, but also multi nested fields (not the nested field type): field.subfield1.subfield2.my_geo_shapes_array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay and thanks for bringing this to my attention. That's very good point and it turned out to be quite tricky to get right. I am still working on it. I might need to make some changes in master first thought to change the way extractSource handles a.b.c. names. I hope to open this pull request monday.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

} catch (IOException ex) {
throw new SqlIllegalArgumentException("Cannot read geo_shape value (returned by [{}])", fieldName);
}
if (dataType == DataType.GEO_POINT || dataType == DataType.GEO_SHAPE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth adding an isGeoBased() method to DataType to perform this double check.

@astefan thanks a lot for assistance and brainstorming earlier today!
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left few minor comments.

value = ((List<?>) values).get(0);
} else {
// This could be a single point in [lon, lat] format or multiple points
if ((list.size() == 2 || list.size() == 3) && (list.get(0) instanceof Number)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if you can add a comment explaining the size() == 3 scenario when this can happen. Something like // This could be a single point in [lon, lat(, altitude)] (where altitude is optional) format or multiple points

try {
return new GeoShape(values);
} catch (IOException ex) {
throw new SqlIllegalArgumentException("Cannot read geo_shape value (returned by [{}])", fieldName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, include the actual value that cannot be read in the error message, to be consistent with the rest of the error messages.

String name = randomAlphaOfLength(10);
pathCombined = pathCombined + name;
source.field(name, randomPoint(lat, lon));
for (int i = layers -2; i >= 0; i--) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really small: white space between the minus sign and 2: i = layers - 2.

@@ -207,7 +218,7 @@ Object extractFromSource(Map<String, Object> map) {

if (node instanceof List) {
List listOfValues = (List) node;
if (listOfValues.size() == 1 || arrayLeniency) {
if ((i < path.length - 1) && (listOfValues.size() == 1 || arrayLeniency)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate it if you can add a comment reflecting the reasoning behind this change, in the context of GEO and arrays of values.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments about FieldHitExtractor.

// we get geo_points from doc values and they were not specified for this record
return null;
}
if (((List<?>) values).size() == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use list instead of recasting values.

@@ -142,6 +173,13 @@ private Object unwrapMultiValue(Object values) {
}
}
}
if (dataType == DataType.GEO_SHAPE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the dataType check below the instanceof checks (like DATETIME) to keep the same style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot. I has to be before check for map because shape can be a map in case of geojson.

@@ -127,8 +127,39 @@ private Object unwrapMultiValue(Object values) {
if (values == null) {
return null;
}
if (dataType == DataType.GEO_POINT || dataType == DataType.GEO_SHAPE) {
return extractGeoShape(values);
if (dataType == DataType.GEO_POINT) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth handling the GEO_POINT case inside the instanceof List check since the two IFs share a bit of code.

value = values;
} else {
// most likely we have multiple points
if (arrayLeniency) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks similar to the if inside instanceof List check below no?

@imotov
Copy link
Contributor Author

imotov commented Apr 9, 2019

@elasticmachine run elasticsearch-ci/2

@imotov
Copy link
Contributor Author

imotov commented Apr 10, 2019

Thanks a lot everyone for the reviews! That was really helpful.

@imotov imotov merged commit d7647c8 into elastic:geosql Apr 10, 2019
@imotov imotov removed the v8.0.0 label Apr 24, 2019
@imotov imotov deleted the null-handling-for-geo-points branch May 1, 2020 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes :Analytics/SQL SQL querying >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants