-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a simulate ingest api #101409
Adding a simulate ingest api #101409
Conversation
Documentation preview: |
Hi @masseyke, I've created a changelog YAML for you. |
…ticsearch into adding-simulate-ingest-api
Really like the |
@elasticmachine run elasticsearch-ci/part-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks Keith! I left one more comment which should be addressed (and maybe a yaml test for it), but nothing major
rest-api-spec/src/main/resources/rest-api-spec/api/simulate.ingest.json
Outdated
Show resolved
Hide resolved
<titleabbrev>Simulate ingest</titleabbrev> | ||
++++ | ||
|
||
Executes ingest pipelines against a set of provided documents, optionally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this and in the doc for the existing simulate, I'm surprised there is no mention of the simulation/testing aspect. In isolation, this reads as if it actually executes the pipeline (and thus indexes some data).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a line below that reads No data is indexed into $Elasticsearch
. I'll add a line about the intended use of this API at the top though.
Hi @masseyke, I was just looking at this new API to asses how we could use it in Kibana to extend the functionality of testing an ingest pipeline. I think we would probably mostly use the substitutions. When testing with substitutions we need an index that uses the pipeline that is being tested, so I'm curious if there is a way to get a list of those indices from ES? |
You're saying that you would like a way to give elasticsearch a pipeline name, and have it return the list of indices that have that pipeline as the default pipeline or final pipeline? |
@masseyke yes, the context is like following: the user is editing an existing pipeline and wants to test their changes. in the UI we would show them a list of indices that already use this pipeline and let them simulate an ingest. We would substitute the existing pipeline with the payload that the user has edited in the UI so far. |
We had an offline discussion about this. We do not have an API right now to get a list of indices for a given pipeline. I think the expected use of this API is a little bit different though, and not something that we currently have a kibana UI for. This API is meant for developing and troubleshooting the integration of a collection of pipelines all working together (as opposed to developing an individual pipeline -- the simulate pipeline API is more appropriate for that). For example, say ingestion into an index has been going fine in production, and has broken with some new piece of data. This API could be used to figure out which pipelines are running, figure out what the output of all those pipelines is, and experiment with modifications to one or more of the pipelines. Once a pipeline has been worked out, you'd still want to run it through whatever regression testing you have in order to make sure that the change doesn't break some other index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM. I don't see how this could affect bulk processing performance (nice, btw!), but I've been surprised before, so I think you should keep an eye on the benchmarks once this has been merged and be ready to do some digging if anything seems off.
@elasticmachine run elasticsearch-ci/part-1 |
This PR introduces a new _ingest/simulate API that runs any pipelines on the given data that would be executed for a given index, but instead of indexing the data into the index, returns the transformed documents. The difference from the simulate pipeline API is that the simulate pipeline API only runs the single pipeline it is given. This new API could potentially run an unlimited number of pipelines -- the given pipeline, the default pipeline for the index given, any default pipelines in indices that the reroute processor forwards the data to, and the final pipeline of the last index in the chain.
For example, if we have the following pipelines:
And then the following index:
Then calling _ingest/_simulate with this data:
would return
You can also specify substitute pipeline definitions so that you can try pipeline changes without actually having to change pipelines. For example, to substitute a new my-pipeline-2, you could do the following:
This substitutes the pipeline body given in the request for the my-pipeline-2 stored in the cluster. The pipeline definition is only changed for this request, and does not impact anything else running on the cluster now or in the future.