-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can search script avoid hashtable lookups for field access? #25913
Comments
I think it is possible and probably dovetails into any work we would do to remove some of the hidden casts as well. I don't know the scheduling for something like that though. |
This can be explored more once we break apart the analysis step into a multi-pass compiler. |
I looked into this when debugging the performance of Our rally benchmarks include a comparison of a
Current performance
Only load doc values once
Use a custom script engine
|
Thanks for benchmarking this @jtibshirani, this is very interesting. @stu-elastic and @jdconrad might have more insights, but I wonder that your second experiment with the HackyScriptEngine also gives type hints that your Painless script doesn't have? Would |
I caught up with @jdconrad earlier and he suggested trying the following Painless script:
Again the numbers represent a combined improvement with the hacky docvalues change. |
@jdconrad Is this a general recommendation to cast ScriptDocValues to explicit type in a user's script? |
@jtibshirani Thanks for the performance numbers! It's nice to know where to put effort to improve things. @mayya-sharipova I don't feel comfortable recommending this generally because it's very fragile. For instance, if the user does a query across indices and the mappings don't match, they may end up with ClassCastExceptions. The user requires direct knowledge of the indices searched on and the mappings for those indices for each query. On the other hand, for a workaround with an advanced user, it'll definitely speed things up. |
A note about this script Casting to long here means that the rest of the math can be standard JVM ASM instructions written at compile-time because we promote all the values here to double (the 1.2 causes this). Without the explicit cast to a long, the runtime must call a method to figure out the type the def value returned from ScriptDocValues and then multiply it by 1.2. |
@jdconrad This makes me wonder whether the suggestion of doing such hash lookups up-front on a per-segment basis would also enable optimizing this automatically, as Painless would know that |
So I've been wondering a thing. Right now scripts operate on values. Is it
crazy to have them operate on typed field accessors instead? I dunno if
that'd end up fast, but it'd give you a data structure with math in it you
can reason about.
…On Sat, Apr 18, 2020, 09:34 Adrien Grand ***@***.***> wrote:
@jdconrad <https://github.com/jdconrad> This makes me wonder whether the
suggestion of doing such hash lookups up-front on a per-segment basis would
also enable optimizing this automatically, as Painless would know that
doc['population'] is a ScriptDocValues.Longs?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#25913 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABUXISVQJQO5ITCZAX5ANDRNGT6XANCNFSM4DUQNKUA>
.
|
@jpountz My understanding is this isn't the case because we still don't know the type until runtime for two reasons. 1. the caching would be done using a member variable on each script instance that doesn't get set until the first instance invocation so we don't have the information when the class is actually written (maybe we could do some deferred writing of the script like Java does with lamdas) and 2. we don't have access to mappings in scripting (something that would be a nice for optimization), but even then you still have to contend with possibly different mappings across indices in a single query @ywelsch suggested that we may able to union all mappings for a specific query, though, to see if there's a common type @nik9000 I'll need to give your suggestion some more thought on how that would work. I do wonder if it's easier to optimize the hashing of this after semantic validation. |
Pinging @elastic/es-core-infra (Team:Core/Infra) |
ESQL avoids this so maybe we should call it "solved by ESQL"? |
That doesn't use scripts, but it's own syntax. But it does work. |
ES|QL does indeed help there, but I'm also thinking of all runtime field definitions that will not benefit from this. Let's close for now, we can still reopen later if interest in this issue comes back. |
Take this search script:
Math.abs(doc['field'].value)
. On every document, we will callLeafDocLookup.get
which mostly fetches script doc values for thefield
field in a hashtable and advances it to the current document. The hashtable lookup feels a bit unnecessary since we will always return the sameScriptDocValues
object on a per-segment basis, is there a way we could avoid it? A hashtable lookup may look lightweight, but when everything that you do in your script is reading a doc value and applying a cheap function likeMath.abs
, it might not be negligible?The text was updated successfully, but these errors were encountered: