-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Improve null handling in Geo Functions #40708
Conversation
ST_Distance function returns null now instead of throwing an error when one of the arguments in null.
Pinging @elastic/es-analytics-geo |
Pinging @elastic/es-search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also add some unit test in FieldHitExtractorTests
?
@@ -143,8 +148,12 @@ private Object unwrapMultiValue(Object values) { | |||
} | |||
if (dataType == DataType.GEO_SHAPE) { | |||
Object value; | |||
if (values instanceof List && ((List<?>) values).size() == 1) { | |||
value = ((List<?>) values).get(0); | |||
if (values instanceof List) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please extract it to a method as it seems duplicate with the GEO_POINT
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave it a try and it LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one comment about arrays of geo shapes, if my assumption about the code is correct.
private Object extractGeoShape(Object values) { | ||
Object value; | ||
if (values instanceof List) { | ||
if (((List<?>) values).size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the correct approach here. If I'm not mistaken you are extracting here the first value from a field that has an array of geo shapes. SQL has added recently a "multi field leniency" attribute to both REST and drivers support: https://www.elastic.co/guide/en/elasticsearch/reference/6.7/sql-rest.html#sql-rest-fields.
Code-wise, this leniency is handled here and here at extraction time. I think you need to include extractGeoShape()
in the array leniency logic probably only in unwrapMultiValue
method.
Also, would be good to see some tests with arrays of geo shapes and not only in one-level fields, but also multi nested fields (not the nested
field type): field.subfield1.subfield2.my_geo_shapes_array
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay and thanks for bringing this to my attention. That's very good point and it turned out to be quite tricky to get right. I am still working on it. I might need to make some changes in master first thought to change the way extractSource handles a.b.c.
names. I hope to open this pull request monday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} catch (IOException ex) { | ||
throw new SqlIllegalArgumentException("Cannot read geo_shape value (returned by [{}])", fieldName); | ||
} | ||
if (dataType == DataType.GEO_POINT || dataType == DataType.GEO_SHAPE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth adding an isGeoBased()
method to DataType
to perform this double check.
@astefan thanks a lot for assistance and brainstorming earlier today!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left few minor comments.
value = ((List<?>) values).get(0); | ||
} else { | ||
// This could be a single point in [lon, lat] format or multiple points | ||
if ((list.size() == 2 || list.size() == 3) && (list.get(0) instanceof Number)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good if you can add a comment explaining the size() == 3
scenario when this can happen. Something like // This could be a single point in [lon, lat(, altitude)] (where altitude is optional) format or multiple points
try { | ||
return new GeoShape(values); | ||
} catch (IOException ex) { | ||
throw new SqlIllegalArgumentException("Cannot read geo_shape value (returned by [{}])", fieldName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, include the actual value that cannot be read in the error message, to be consistent with the rest of the error messages.
String name = randomAlphaOfLength(10); | ||
pathCombined = pathCombined + name; | ||
source.field(name, randomPoint(lat, lon)); | ||
for (int i = layers -2; i >= 0; i--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really small: white space between the minus
sign and 2
: i = layers - 2
.
@@ -207,7 +218,7 @@ Object extractFromSource(Map<String, Object> map) { | |||
|
|||
if (node instanceof List) { | |||
List listOfValues = (List) node; | |||
if (listOfValues.size() == 1 || arrayLeniency) { | |||
if ((i < path.length - 1) && (listOfValues.size() == 1 || arrayLeniency)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would appreciate it if you can add a comment reflecting the reasoning behind this change, in the context of GEO and arrays of values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some comments about FieldHitExtractor.
// we get geo_points from doc values and they were not specified for this record | ||
return null; | ||
} | ||
if (((List<?>) values).size() == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use list
instead of recasting values.
@@ -142,6 +173,13 @@ private Object unwrapMultiValue(Object values) { | |||
} | |||
} | |||
} | |||
if (dataType == DataType.GEO_SHAPE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the dataType check below the instanceof checks (like DATETIME
) to keep the same style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot. I has to be before check for map because shape can be a map in case of geojson.
@@ -127,8 +127,39 @@ private Object unwrapMultiValue(Object values) { | |||
if (values == null) { | |||
return null; | |||
} | |||
if (dataType == DataType.GEO_POINT || dataType == DataType.GEO_SHAPE) { | |||
return extractGeoShape(values); | |||
if (dataType == DataType.GEO_POINT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth handling the GEO_POINT case inside the instanceof List
check since the two IFs share a bit of code.
value = values; | ||
} else { | ||
// most likely we have multiple points | ||
if (arrayLeniency) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks similar to the if inside instanceof List check below no?
@elasticmachine run elasticsearch-ci/2 |
Thanks a lot everyone for the reviews! That was really helpful. |
ST_Distance function returns null now instead of throwing an error
when one of the arguments in null and cleans up null handling
logic in the field extractor.
@bpintea, could you take a look to see if that would work better for you in terms of null handling?