-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest: Enable Instantiating and Calling other Processors from ScriptProcessor #32043
Ingest: Enable Instantiating and Calling other Processors from ScriptProcessor #32043
Conversation
…essors-in-script-proc
…ing-processors-in-script-proc
Pinging @elastic/es-core-infra |
@original-brownbear Thanks for starting to think about this! I was envisioning something a bit different, though. In particular, this design suffers from still having to configure the processors outside of the actual painless script. My idea was more to expose each processor as a method in painless, at least for the ones that make sense. For set, I don't think it makes sense to have a method, since this is a simple one line in painless already. But imagine wanting to use the new bytes processor with something like this:
This is possible with the relatively recent whitelisting made available in painless. Also, in the near term future, we will be able to statically import methods in the whitelist, so that using the |
I actually found that to be a big advantage of my approach :D for a few reasons:
That said, I agree the API isn't so nice here. But couldn't we resolve that by adding some more logic to the compiler to e.g. just compile my example section of: "custom_set_proc": {
"set": {
"field": "field1",
"value": 582.1
} into a method PUT _ingest/pipeline/my-pipeline-id
{
"description" : "describe pipeline",
"processors" : [
{
"script": {
"extra_processors": {
"custom_set_proc": {
"set": {
"field": "field1",
"value": 582.1
}
}
},
"lang": "painless",
"source": """
custom_set_proc(doc);
ctx.field_a_plus_b_times_c = (ctx.field_a + ctx.field_b) * params.param_c
""",
"params": {
"param_c": 10
}
}
}
]
} (then we could also preload stuff like you example of the bytes processor that has no configuration to it and only takes the data it processes as input, but also keep the ability to neatly handle things like Grok that need more compilation ... or nested script processors)? :) |
@rjernst one thing I was thinking about here that wouldn't require much work and would mostly keep your API suggestion would be to just add a
if we always pass that object to those processors that need a cache, then it's lifecycle will be the same as that of its processor and we don't need any new fields in the script processor's DSL. Just an idea :) Not as perfect as not having the |
@rjernst also, regarding the above suggestion. If we're going for a
we could:
=> the |
@original-brownbear I am on board to explore (2). |
This would be a simple prototype for how calling other processors from the
ScriptProcessor
could work.Example
returns:
ctx
still worksextra_processors
field could be invoked viaextraProcessors.invoke("custom_set_proc", doc);
doc
is the actualIngestDocument
, it's obviously redundant toctx
butctx
can't go anywhere because of BwC anywayThis is built on top of #32003
Motivation for doing it this way:
extraProcessors
does not require and tricky changes to Painless parsing and requires minimal amount of extra whitelisting since we don't have to whitelist all the processors separately