SQL: Fix issue with field names containing "." #37364

matriv · 2019-01-11T14:00:32Z

Adjust FieldExtractor to be handle fields which contain . in their
name, regardless where they fall in in the document hierarchy. E.g.:

{
  "a.b": "Elastic Search"
}

{
  "a": {
    "b.c": "Elastic Search"
  }
}

{
  "a.b": {
    "c": {
      "d.e" : "Elastic Search"
    }
  }
}

Fixes: #37128

Adjust FieldExtractor to be handle fields which contain `.` in their name, regardless where they fall in in the document hierarchy. E.g.: ``` { "a.b": "Elastic Search" } { "a": { "b.c": "Elastic Search" } } { "a.b": { "c": { "d.e" : "Elastic Search" } } } ``` Fixes: elastic#37128

elasticmachine · 2019-01-11T14:00:34Z

Pinging @elastic/es-search

matriv · 2019-01-11T14:07:44Z

I tried to make use of the XContentMapValues methods extractRawValues and filter to replace the "ugly" code in the FieldExtractor but this seems not possible as the extractRawValues doesn't distinguish between a missing entry in the map and a null entry. For example if we have an Array [null, null] then the extractRawValues will return an empty List which prevents us from handling it correctly and throwing the "Arrays not supported" exception.

matriv · 2019-01-11T14:15:09Z

.../src/main/java/org/elasticsearch/xpack/sql/execution/search/extractor/FieldHitExtractor.java

            } else {
                throw new SqlIllegalArgumentException("Cannot extract value [{}] from source", fieldName);
            }
        }
        return unwrapMultiValue(value);
    }

+    private Tuple<Object, Integer> extractAsDottedField(Map<String, Object> map, int idx, String node) {


I'm thinking to remove this extracted code back to the body of extractFromSource() as now we create an extra Tuple and Integer object which happens for every document returned. What do you think?

Also the code might be a bit ugly but I wanted to avoid calling recursively a method (with every combination of a.b, a.b.c, etc.) Happy to hear better suggestions.

+1. It looks like a small method and while it might be inlined the extra Tuple/Integer are boiler-plate.
If the method gets unrolled, both the index and the found value will be available without wrapping/boxing.

matriv · 2019-01-11T14:19:00Z

run gradle build tests 2

costin

LGTM.
I like the lack of virtual calls - the structure is small enough that we can extract the values through loops. +1 on unrolling the method so there's no additional method call.

Regarding tests, I would add a few more examples such as : a.b { c { d.e } } a.b.c { d { e } and {a.b { c.d.e} } just to be on the safe side.

costin · 2019-01-11T17:01:49Z

.../src/main/java/org/elasticsearch/xpack/sql/execution/search/extractor/FieldHitExtractor.java

@@ -39,7 +40,7 @@
     */
    private static String[] sourcePath(String name, boolean useDocValue, String hitName) {
        return useDocValue ? Strings.EMPTY_ARRAY : Strings
-                .tokenizeToStringArray(hitName == null ? name : name.substring(hitName.length() + 1), ".");
+            .tokenizeToStringArray(hitName == null ? name : name.substring(hitName.length() + 1), ".");


There's definitely some formatting differences here.
Anyway, make sure you set your IDE to format only the lines that were modified as that prevents unnecessary changes like the above.

costin · 2019-01-11T17:04:44Z

.../src/main/java/org/elasticsearch/xpack/sql/execution/search/extractor/FieldHitExtractor.java

            } else {
                throw new SqlIllegalArgumentException("Cannot extract value [{}] from source", fieldName);
            }
        }
        return unwrapMultiValue(value);
    }

+    private Tuple<Object, Integer> extractAsDottedField(Map<String, Object> map, int idx, String node) {


+1. It looks like a small method and while it might be inlined the extra Tuple/Integer are boiler-plate.
If the method gets unrolled, both the index and the found value will be available without wrapping/boxing.

costin · 2019-01-11T17:05:55Z

.../src/main/java/org/elasticsearch/xpack/sql/execution/search/extractor/FieldHitExtractor.java

@@ -178,8 +204,8 @@ public boolean equals(Object obj) {
        }
        FieldHitExtractor other = (FieldHitExtractor) obj;
        return fieldName.equals(other.fieldName)
-                && hitName.equals(other.hitName)
-                && useDocValue == other.useDocValue;
+            && hitName.equals(other.hitName)


Again with the formatting...

matriv · 2019-01-11T17:27:06Z

@costin I don't know if you noticed already because it was my last commit but I added this: https://github.com/elastic/elasticsearch/pull/37364/files#diff-9aaee9be08445653bba7407b9f2b5ca3R269 Would you like to still have other cases in separate methods?

costin · 2019-01-11T17:39:07Z

The random is fine :)

astefan

LGTM. Nice, elegant solution.

imotov

I think we can improve it a bit.

DELETE test

PUT /test/blah/1
{
  "name.full": "Shane Connelly",
  "name.parts.first": "Shane"
}

PUT /test/blah/2
{
  "name": {
    "full": "Shane Connelly"
  },
  "name.parts": {
    "first": "Shane"
  }
}

POST /_xpack/sql?format=txt
{
  "query": "DESCRIBE test"
}

POST /_xpack/sql?format=txt
{
  "query": "select * from test"
}

GET /test/_search
{
  "_source": {
    "includes": "name.parts.first"
  }
}

matriv · 2019-01-12T07:38:10Z

@imotov I don't get your comment, how to improve it? in your example

POST /_xpack/sql?format=txt
{
  "query": "select * from test"
}

returns:

   name.full   |name.parts.first
---------------+----------------
Shane Connelly |Shane
Shane Connelly |Shane

imotov · 2019-01-12T14:27:27Z

Interesting. I ran it twice with this PR by itself and with this PR merged into master and every time I am getting

   name.full   |name.parts.first
---------------+----------------
Shane Connelly |Shane           
Shane Connelly |null

Could you add an additional record like this and see what do you get back?

PUT /test/blah/3
{
  "name.parts": {
    "first": "Shane I"
  },
  "name": {
    "parts.first": "Shane II",
    "full": "Shane Connelly"
  }
}

I am now getting

   name.full   |name.parts.first
---------------+----------------
Shane Connelly |Shane           
Shane Connelly |null            
Shane Connelly |Shane II

matriv · 2019-01-12T18:10:36Z

I get:

   name.full   |name.parts.first
---------------+----------------
Shane Connelly |Shane
Shane Connelly |Shane
Shane Connelly |Shane II

:-)

matriv · 2019-01-12T18:46:15Z

Hm, I cleaned and recompiled and I get the null too, will check it, thx!

matriv · 2019-01-13T16:00:22Z

@imotov @costin @astefan Adjusted the code to handle multiple entries in the map with common prefixes. Please take another look.

Many thanks @imotov for catching that!

matriv · 2019-01-13T17:18:47Z

Now the query yields:

   name.full   |name.parts.first
---------------+----------------
Shane Connelly |Shane
Shane Connelly |Shane
Shane Connelly |Shane I

so it's Shane I and not Shane II since we're searching for the longer path in the map that yields non-null value. What do you think?

matriv · 2019-01-13T21:09:57Z

@costin Check this: https://github.com/elastic/elasticsearch/pull/37364/files#diff-9aaee9be08445653bba7407b9f2b5ca3R312
The previous tests only had a single entry in the map.

Will make the changes to catch the same path(s)/different hierarchy(ies) and throw an exception about not supporting arrays (multi-values).

matriv · 2019-01-13T21:16:37Z

The difference now is here: https://github.com/elastic/elasticsearch/pull/37364/files#diff-6d292f4ab3ddfc415649c328fd2faefeR157
So we don't stop once a non-null value is found in the map but we keep walking the path and holding the latest non-null value found.

matriv · 2019-01-14T14:04:18Z

Pushed the new approach to handle multiple values for the same path but different hierarchies.

matriv · 2019-01-14T14:46:26Z

run default distro tests

matriv · 2019-01-14T15:13:46Z

run default distro tests

matriv · 2019-01-14T15:23:11Z

run gradle build tests 1

imotov

LGTM. Thanks!

matriv · 2019-01-14T16:53:59Z

@imotov thanks for checking and catching this.

costin

Left some minor comments. Let's get this in.

costin · 2019-01-14T18:10:27Z

.../src/main/java/org/elasticsearch/xpack/sql/execution/search/extractor/FieldHitExtractor.java

+
+        // Used to avoid recursive method calls
+        // Holds the sub-maps in the document hierarchy that are pending to be inspected.
+        LinkedList<Map<String, Object>> queue = new LinkedList<>();


minor nitpick - use the Deque instead of LinkedList.

Since the index and the Map are associated, how about using only one Deque which holds a Tuple instead of two Deque:

Deque<Tuple<Map<String, Object> index>>` queue = ... if (node instanceof Map) { queue.add(new Tuple<>(node, Integer.valueOf(i)); }

Actually, I had those 2 before, while implementing. :-)

I chose LinkedList as we don't need any special operations of queue (we don't care about the order of dequeuing). I've just read about the better performance of ArrayDequeue over LinkedList and I'll change.

For the Tuple I had it as you said but I changed to a separate structure that holds integers to avoid the instantiation of Tuple objects on top of the Integer boxing. What do you think?

And after benchmarking: https://gist.github.com/matriv/07e5909a49bed9794f350f458a0b2c60
The results show:

Benchmark Mode Cnt Score Error Units MyBenchmark.testArrayDequeue thrpt 25 608384,000 ± 7893,014 ops/s MyBenchmark.testLinkedList thrpt 25 581316,884 ± 8028,331 ops/s

Autoboxing already happens and I wouldn't worry to much about it considering the depth is not that big. Same for Linked vs Array (in general arrays are faster except for inserting in the middle as that requires resizing/copying at which the linked structure excels).
I think the Tuple makes the code a bit more compact and safe (the queues cannot get out of sync) and more readable/simple code always trumps optimization (especially micro ones as here).

Just FYI regarding the Tuples:
https://gist.github.com/matriv/3d2557ae1621e99a3c9505b5e4e6998f
and the result:

Benchmark Mode Cnt Score Error Units MyBenchmark.testNoTuples thrpt 25 601543,170 ± 7859,704 ops/s MyBenchmark.testTuples thrpt 25 573953,828 ± 5987,749 ops/s

costin · 2019-01-14T18:14:31Z

.../src/main/java/org/elasticsearch/xpack/sql/execution/search/extractor/FieldHitExtractor.java

+                if (node instanceof Map) {
+                    // Add the sub-map to the queue along with the current path index
+                    queue.add((Map<String, Object>) node);
+                    idxQueue.add(i);


Why not add i+1 to the queue since the next loop should start from the next position?

no reason, I just preferred to start the i index of the for loop from prevPosition + 1 just because personally I see it more "visible". Would you prefer it your way?

Adjust FieldExtractor to handle fields which contain `.` in their name, regardless where they fall in, in the document hierarchy. E.g.: ``` { "a.b": "Elastic Search" } { "a": { "b.c": "Elastic Search" } } { "a.b": { "c": { "d.e" : "Elastic Search" } } } ``` Fixes: #37128

matriv · 2019-01-15T08:06:58Z

Backported to 6.6 with 874b5ec

Adjust FieldExtractor to handle fields which contain `.` in their name, regardless where they fall in, in the document hierarchy. E.g.: ``` { "a.b": "Elastic Search" } { "a": { "b.c": "Elastic Search" } } { "a.b": { "c": { "d.e" : "Elastic Search" } } } ``` Fixes: #37128

matriv · 2019-01-15T08:14:57Z

Backported to 6.5 with c72bd51

Adjust FieldExtractor to handle fields which contain `.` in their name, regardless where they fall in, in the document hierarchy. E.g.: ``` { "a.b": "Elastic Search" } { "a": { "b.c": "Elastic Search" } } { "a.b": { "c": { "d.e" : "Elastic Search" } } } ``` Fixes: #37128

matriv · 2019-01-15T08:41:33Z

Backported to 6.x with fd3701b

@Inject

* elastic/master: Docs be explicit on how to turn off deprecated auditing (elastic#37316) Fix line length for monitor and remove suppressions (elastic#37456) Fix IndexShardTestCase.recoverReplica(IndexShard, IndexShard, boolean) (elastic#37414) Update the Flush API documentation (elastic#33551) [TEST] Muted testDifferentRolesMaintainPathOnRestart Remove dead code from ShardSearchStats (elastic#37421) Simplify testSendSnapshotSendsOps (elastic#37445) SQL: Fix issue with field names containing "." (elastic#37364) Restore lost @Inject annotation (elastic#37452)

Throws an exception if hit extractor tries to retrieve unsupported object. For example, selecting "a" from `{"a": {"b": "c"}}` now throws an exception instead of returning null. Relates to elastic#37364

imotov · 2019-01-15T20:13:10Z

@matriv I am sorry. It looks like I missed another edge case during review and immediately got bitten by it in geosql, which actually can extract objects from source in some cases. I opened #37502 to fix it.

Throws an exception if hit extractor tries to retrieve unsupported object. For example, selecting "a" from `{"a": {"b": "c"}}` now throws an exception instead of returning null. Relates to #37364

matriv added >bug v7.0.0 :Analytics/SQL SQL querying v6.6.0 v6.7.0 v6.5.5 labels Jan 11, 2019

matriv requested a review from costin January 11, 2019 14:00

matriv requested a review from astefan January 11, 2019 14:00

matriv commented Jan 11, 2019

View reviewed changes

matriv added 2 commits January 11, 2019 17:41

Added test

e75a4e8

Added test with randomness

266cd2b

matriv requested a review from imotov January 11, 2019 16:43

costin approved these changes Jan 11, 2019

View reviewed changes

Address comments

b38aa31

astefan approved these changes Jan 11, 2019

View reviewed changes

Added some comments to the algorithm

7c17a51

imotov reviewed Jan 11, 2019

View reviewed changes

Adjust algorithm for common prefixes

eec5897

Handle multiple values in different path hiearchy

00e5c09

Merge remote-tracking branch 'upstream/master' into mt/fix-37128

c39978f

imotov approved these changes Jan 14, 2019

View reviewed changes

costin approved these changes Jan 14, 2019

View reviewed changes

matriv added 2 commits January 14, 2019 22:45

Address comment

d8fdb93

Address comment

818e48d

matriv merged commit b594e81 into elastic:master Jan 15, 2019

matriv deleted the mt/fix-37128 branch January 15, 2019 07:41

imotov mentioned this pull request Jan 15, 2019

SQL: fix object extraction from sources #37502

Merged

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

SQL: Fix issue with field names containing "." #37364

SQL: Fix issue with field names containing "." #37364

Conversation

matriv commented Jan 11, 2019

elasticmachine commented Jan 11, 2019

matriv commented Jan 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv commented Jan 11, 2019

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv commented Jan 11, 2019 • edited Loading

costin commented Jan 11, 2019

astefan left a comment

Choose a reason for hiding this comment

imotov left a comment

Choose a reason for hiding this comment

matriv commented Jan 12, 2019

imotov commented Jan 12, 2019 • edited Loading

matriv commented Jan 12, 2019 • edited Loading

matriv commented Jan 12, 2019

matriv commented Jan 13, 2019

matriv commented Jan 13, 2019

matriv commented Jan 13, 2019

matriv commented Jan 13, 2019

matriv commented Jan 14, 2019

matriv commented Jan 14, 2019

matriv commented Jan 14, 2019

matriv commented Jan 14, 2019

imotov left a comment

Choose a reason for hiding this comment

matriv commented Jan 14, 2019

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv Jan 14, 2019 • edited Loading

Choose a reason for hiding this comment

matriv Jan 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matriv commented Jan 15, 2019

matriv commented Jan 15, 2019 • edited Loading

matriv commented Jan 15, 2019

imotov commented Jan 15, 2019

matriv commented Jan 11, 2019 •

edited

Loading

imotov commented Jan 12, 2019 •

edited

Loading

matriv commented Jan 12, 2019 •

edited

Loading

matriv Jan 14, 2019 •

edited

Loading

matriv Jan 14, 2019 •

edited

Loading

matriv commented Jan 15, 2019 •

edited

Loading