-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Fix issue with field names containing "." #37364
Conversation
Adjust FieldExtractor to be handle fields which contain `.` in their name, regardless where they fall in in the document hierarchy. E.g.: ``` { "a.b": "Elastic Search" } { "a": { "b.c": "Elastic Search" } } { "a.b": { "c": { "d.e" : "Elastic Search" } } } ``` Fixes: elastic#37128
Pinging @elastic/es-search |
I tried to make use of the |
} else { | ||
throw new SqlIllegalArgumentException("Cannot extract value [{}] from source", fieldName); | ||
} | ||
} | ||
return unwrapMultiValue(value); | ||
} | ||
|
||
private Tuple<Object, Integer> extractAsDottedField(Map<String, Object> map, int idx, String node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking to remove this extracted code back to the body of extractFromSource()
as now we create an extra Tuple
and Integer
object which happens for every document returned. What do you think?
Also the code might be a bit ugly but I wanted to avoid calling recursively a method (with every combination of a.b
, a.b.c
, etc.) Happy to hear better suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. It looks like a small method and while it might be inlined the extra Tuple/Integer
are boiler-plate.
If the method gets unrolled, both the index and the found value will be available without wrapping/boxing.
run gradle build tests 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I like the lack of virtual calls - the structure is small enough that we can extract the values through loops. +1 on unrolling the method so there's no additional method call.
Regarding tests, I would add a few more examples such as : a.b { c { d.e } }
a.b.c { d { e }
and {a.b { c.d.e} }
just to be on the safe side.
@@ -39,7 +40,7 @@ | |||
*/ | |||
private static String[] sourcePath(String name, boolean useDocValue, String hitName) { | |||
return useDocValue ? Strings.EMPTY_ARRAY : Strings | |||
.tokenizeToStringArray(hitName == null ? name : name.substring(hitName.length() + 1), "."); | |||
.tokenizeToStringArray(hitName == null ? name : name.substring(hitName.length() + 1), "."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's definitely some formatting differences here.
Anyway, make sure you set your IDE to format only the lines that were modified as that prevents unnecessary changes like the above.
} else { | ||
throw new SqlIllegalArgumentException("Cannot extract value [{}] from source", fieldName); | ||
} | ||
} | ||
return unwrapMultiValue(value); | ||
} | ||
|
||
private Tuple<Object, Integer> extractAsDottedField(Map<String, Object> map, int idx, String node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. It looks like a small method and while it might be inlined the extra Tuple/Integer
are boiler-plate.
If the method gets unrolled, both the index and the found value will be available without wrapping/boxing.
@@ -178,8 +204,8 @@ public boolean equals(Object obj) { | |||
} | |||
FieldHitExtractor other = (FieldHitExtractor) obj; | |||
return fieldName.equals(other.fieldName) | |||
&& hitName.equals(other.hitName) | |||
&& useDocValue == other.useDocValue; | |||
&& hitName.equals(other.hitName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again with the formatting...
@costin I don't know if you noticed already because it was my last commit but I added this: https://github.com/elastic/elasticsearch/pull/37364/files#diff-9aaee9be08445653bba7407b9f2b5ca3R269 Would you like to still have other cases in separate methods? |
The random is fine :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice, elegant solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can improve it a bit.
DELETE test
PUT /test/blah/1
{
"name.full": "Shane Connelly",
"name.parts.first": "Shane"
}
PUT /test/blah/2
{
"name": {
"full": "Shane Connelly"
},
"name.parts": {
"first": "Shane"
}
}
POST /_xpack/sql?format=txt
{
"query": "DESCRIBE test"
}
POST /_xpack/sql?format=txt
{
"query": "select * from test"
}
GET /test/_search
{
"_source": {
"includes": "name.parts.first"
}
}
@imotov I don't get your comment, how to improve it? in your example
returns:
|
Interesting. I ran it twice with this PR by itself and with this PR merged into master and every time I am getting
Could you add an additional record like this and see what do you get back?
I am now getting
|
I get:
:-) |
Hm, I cleaned and recompiled and I get the null too, will check it, thx! |
Now the query yields:
so it's |
@costin Check this: https://github.com/elastic/elasticsearch/pull/37364/files#diff-9aaee9be08445653bba7407b9f2b5ca3R312 Will make the changes to catch the same path(s)/different hierarchy(ies) and throw an exception about not supporting arrays (multi-values). |
The difference now is here: https://github.com/elastic/elasticsearch/pull/37364/files#diff-6d292f4ab3ddfc415649c328fd2faefeR157 |
Pushed the new approach to handle multiple values for the same path but different hierarchies. |
run default distro tests |
1 similar comment
run default distro tests |
run gradle build tests 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
@imotov thanks for checking and catching this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments. Let's get this in.
|
||
// Used to avoid recursive method calls | ||
// Holds the sub-maps in the document hierarchy that are pending to be inspected. | ||
LinkedList<Map<String, Object>> queue = new LinkedList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nitpick - use the Deque
instead of LinkedList
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the index and the Map are associated, how about using only one Deque
which holds a Tuple
instead of two Deque
:
Deque<Tuple<Map<String, Object> index>>` queue = ...
if (node instanceof Map) {
queue.add(new Tuple<>(node, Integer.valueOf(i));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I had those 2 before, while implementing. :-)
I chose LinkedList
as we don't need any special operations of queue (we don't care about the order of dequeuing). I've just read about the better performance of ArrayDequeue
over LinkedList
and I'll change.
For the Tuple
I had it as you said but I changed to a separate structure that holds integers to avoid the instantiation of Tuple
objects on top of the Integer boxing. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And after benchmarking: https://gist.github.com/matriv/07e5909a49bed9794f350f458a0b2c60
The results show:
Benchmark Mode Cnt Score Error Units
MyBenchmark.testArrayDequeue thrpt 25 608384,000 ± 7893,014 ops/s
MyBenchmark.testLinkedList thrpt 25 581316,884 ± 8028,331 ops/s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Autoboxing already happens and I wouldn't worry to much about it considering the depth is not that big. Same for Linked
vs Array
(in general arrays are faster except for inserting in the middle as that requires resizing/copying at which the linked structure excels).
I think the Tuple
makes the code a bit more compact and safe (the queues cannot get out of sync) and more readable/simple code always trumps optimization (especially micro ones as here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI regarding the Tuples:
https://gist.github.com/matriv/3d2557ae1621e99a3c9505b5e4e6998f
and the result:
Benchmark Mode Cnt Score Error Units
MyBenchmark.testNoTuples thrpt 25 601543,170 ± 7859,704 ops/s
MyBenchmark.testTuples thrpt 25 573953,828 ± 5987,749 ops/s
if (node instanceof Map) { | ||
// Add the sub-map to the queue along with the current path index | ||
queue.add((Map<String, Object>) node); | ||
idxQueue.add(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add i+1 to the queue since the next loop should start from the next position?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no reason, I just preferred to start the i
index of the for loop from prevPosition + 1 just because personally I see it more "visible". Would you prefer it your way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
Adjust FieldExtractor to handle fields which contain `.` in their name, regardless where they fall in, in the document hierarchy. E.g.: ``` { "a.b": "Elastic Search" } { "a": { "b.c": "Elastic Search" } } { "a.b": { "c": { "d.e" : "Elastic Search" } } } ``` Fixes: #37128
Backported to |
Adjust FieldExtractor to handle fields which contain `.` in their name, regardless where they fall in, in the document hierarchy. E.g.: ``` { "a.b": "Elastic Search" } { "a": { "b.c": "Elastic Search" } } { "a.b": { "c": { "d.e" : "Elastic Search" } } } ``` Fixes: #37128
Backported to |
Adjust FieldExtractor to handle fields which contain `.` in their name, regardless where they fall in, in the document hierarchy. E.g.: ``` { "a.b": "Elastic Search" } { "a": { "b.c": "Elastic Search" } } { "a.b": { "c": { "d.e" : "Elastic Search" } } } ``` Fixes: #37128
Backported to |
* elastic/master: Docs be explicit on how to turn off deprecated auditing (elastic#37316) Fix line length for monitor and remove suppressions (elastic#37456) Fix IndexShardTestCase.recoverReplica(IndexShard, IndexShard, boolean) (elastic#37414) Update the Flush API documentation (elastic#33551) [TEST] Muted testDifferentRolesMaintainPathOnRestart Remove dead code from ShardSearchStats (elastic#37421) Simplify testSendSnapshotSendsOps (elastic#37445) SQL: Fix issue with field names containing "." (elastic#37364) Restore lost @Inject annotation (elastic#37452)
Throws an exception if hit extractor tries to retrieve unsupported object. For example, selecting "a" from `{"a": {"b": "c"}}` now throws an exception instead of returning null. Relates to elastic#37364
Throws an exception if hit extractor tries to retrieve unsupported object. For example, selecting "a" from `{"a": {"b": "c"}}` now throws an exception instead of returning null. Relates to #37364
Throws an exception if hit extractor tries to retrieve unsupported object. For example, selecting "a" from `{"a": {"b": "c"}}` now throws an exception instead of returning null. Relates to #37364
Throws an exception if hit extractor tries to retrieve unsupported object. For example, selecting "a" from `{"a": {"b": "c"}}` now throws an exception instead of returning null. Relates to #37364
Adjust FieldExtractor to be handle fields which contain
.
in theirname, regardless where they fall in in the document hierarchy. E.g.:
Fixes: #37128