-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a simulate ingest API #99270
Adding a simulate ingest API #99270
Conversation
Hi @masseyke, I've created a changelog YAML for you. |
…om:masseyke/elasticsearch into feature/simulate-ingest-with-pipeline-defs
I like the direction this is taking. It means we have some unmodified sample events, it is possible to use the simulate API with these events and see what the end result is / where these events end up. The pipeline substitution is key. Imagine at some point, also templates / component templates can be substituted. @masseyke The focus on the output is on _source for the docs. What happens in synthetic source scenarios like TSDB? |
I don't think I'm following. The source is maintained by the pipelines until indexing time, and that is what is displayed in the output. Indexing itself doesn't give us the source as output, and we're not querying the index to get the source / synthetic source. |
Oversight on my end. Of course the _source is only removed during indexing 🤦♂️ All good. |
Replaced by #101409 |
This is a draft PR that introduces a new
_ingest/simulate
API that runs any pipelines on the given data that would be executed for a given index, but instead of indexing the data into the index, returns the transformed documents. The difference from the simulate pipeline API is that the simulate pipeline API only runs the single pipeline it is given. This new API could potentially run an unlimited number of pipelines -- the given pipeline, the default pipeline for the index given, any default pipelines in indices that the reroute processor forwards the data to, and the final pipeline of the last index in the chain.For example, if we have the following pipelines:
And then the following index:
Then calling
_ingest/_simulate
with this data:might return
You can also specify substitute pipeline definitions so that you can try pipeline changes without actually having to change pipelines. For example, to substitute a new my-pipeline-2, you could do the following:
This substitutes the pipeline body given in the request for the
my-pipeline-2
stored in the cluster. The pipeline definition is only changed for this request, and does not impact anything else running on the cluster now or in the future.As a side note, here were some of the guidelines I used (and why the code is a little odd):