-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/simulate ingest with pipeline defs and mapping validation #99920
Feature/simulate ingest with pipeline defs and mapping validation #99920
Conversation
…om:masseyke/elasticsearch into feature/simulate-ingest-with-pipeline-defs
Hi @masseyke, I've created a changelog YAML for you. |
…validation' of github.com:masseyke/elasticsearch into feature/simulate-ingest-with-pipeline-defs-and-mapping-validation
The _simulate currently also cannot deal with the Because this:
results in this:
And the Also I somehow never use this:
I usually do this (through Kibana Dev Tools)
So I'll add the index name into the |
This will not impact the |
It sort of does. If you specify a pipeline processor it will read that one as well, so you can run multiple nested pipelines. Will the new Does this imply that the |
Yes I believe so, but I don't have a timeline for that and it definitely won't be in 8.13.0.
No, definitely not. The two APIs server different purposes. The older API is more useful for development of individual pipelines. The new API is more useful for testing the integration of multiple pipelines and their configuration on indices. |
Ok perfect, just one thing that I didn't see in the tests here to run. Does it also work with dynamic mappings and runtime mappings when they are defined in the index? |
Yes, it is using the exact same code as the |
This is a draft PR that introduces a new _ingest/simulate API that runs any pipelines on the given data that would be executed for a given index, but instead of indexing the data into the index, returns the transformed documents. The difference from the simulate pipeline API is that the simulate pipeline API only runs the single pipeline it is given. This new API could potentially run an unlimited number of pipelines -- the given pipeline, the default pipeline for the index given, any default pipelines in indices that the reroute processor forwards the data to, and the final pipeline of the last index in the chain.
For example, if we have the following pipelines:
And then the following indexes:
Then calling _ingest/_simulate with this data:
might return
You can also specify substitute pipeline definitions so that you can try pipeline changes without actually having to change pipelines. For example, to substitute a new my-pipeline-2, you could do the following:
This substitutes the pipeline body given in the request for the my-pipeline-2 stored in the cluster. The pipeline definition is only changed for this request, and does not impact anything else running on the cluster now or in the future.
If the index that the data would land in (
my-index-2
in the example above) exists, then the API will validate that the output of the pipelines is compatible with the index. For example if we intentionally setmy-boolean-field
to the wrong type:Then you would still get the output of the pipelines, but you would also get the validation error:
If the index where the data would land does not exist, then the result of the pipelines is displayed, along with an error message that the index does not exist (so the mappings could not be validated). For example after calling:
Then:
Regardless of the result of the call to the API, no data is actually indexed, and no mappings are actually updated.
As a side note, here were some of the guidelines I used (and why the code is a little odd):
Make the API easy to use, and familiar to users of the simulate pipeline API.
Use as much of the existing bulk API logic as possible so that simulate does not diverge from real ingest behavior
Do not impact bulk API performance
Modify the bulk API code as little as possible. This is very critical code, and any change is an opportunity to introduce bugs.