Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a simulate ingest api #101409

Merged
merged 42 commits into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
be5f759
Adding a simulate ingest API
masseyke Oct 17, 2023
61aeb42
unit testing
masseyke Oct 18, 2023
f1fb903
cleaning up
masseyke Oct 18, 2023
c363a19
more testing
masseyke Oct 18, 2023
cb92a65
Changing response format
masseyke Oct 26, 2023
3aa9b66
Update docs/changelog/101409.yaml
masseyke Oct 26, 2023
be086d2
Merge branch 'main' into adding-simulate-ingest-api
masseyke Oct 26, 2023
32d77ba
Merge branch 'adding-simulate-ingest-api' of github.com:masseyke/elas…
masseyke Oct 26, 2023
7e33d21
Updating docs
masseyke Oct 27, 2023
ce0bbd6
Merge branch 'main' into adding-simulate-ingest-api
masseyke Oct 30, 2023
4ad2eb5
updating docs
masseyke Oct 30, 2023
8fc1e85
fixing merge error
masseyke Oct 30, 2023
896c3c3
fixing permissions for docs test
masseyke Oct 30, 2023
c8f5dd6
fixing docs
masseyke Oct 30, 2023
2312f12
adding yaml rest tests
masseyke Oct 31, 2023
91f6fa2
fixing action name
masseyke Oct 31, 2023
4195e17
spotlessApply
masseyke Oct 31, 2023
8556fc1
improving rest tests
masseyke Oct 31, 2023
33e094f
spotlessApply
masseyke Oct 31, 2023
2926257
fixing yaml test
masseyke Oct 31, 2023
fa9dd35
Merge branch 'main' into adding-simulate-ingest-api
masseyke Nov 3, 2023
39eefd3
adding comments, fixing transport handling
masseyke Nov 3, 2023
c960c3c
cleanup
masseyke Nov 6, 2023
3e3a426
fixing action name
masseyke Nov 6, 2023
0a92123
fixing security test
masseyke Nov 7, 2023
b9f402b
fixing response serialization
masseyke Nov 7, 2023
25083fc
Merge branch 'main' into adding-simulate-ingest-api
masseyke Nov 7, 2023
0fa2c56
Merge branch 'main' into adding-simulate-ingest-api
masseyke Nov 9, 2023
6be766e
code review changes
masseyke Nov 9, 2023
fd12236
code review feedback
masseyke Nov 9, 2023
f7d356f
attempting to remove compiler warning
masseyke Nov 9, 2023
de01166
adding a unit test for SimulateIngestRestToXContentListener
masseyke Nov 9, 2023
255a08b
making index a path parameter
masseyke Nov 9, 2023
c6e2216
renaming a method
masseyke Nov 9, 2023
83f5efe
updating rest api spec and adding more tests
masseyke Nov 10, 2023
bf082e4
code review feedback on docs
masseyke Nov 10, 2023
5cdc909
code review feedback on docs
masseyke Nov 10, 2023
e35c556
fixing docs
masseyke Nov 13, 2023
822b6c6
Merge branch 'main' into adding-simulate-ingest-api
masseyke Nov 13, 2023
0099cfa
code review feedback
masseyke Nov 15, 2023
3364ce8
Merge branch 'main' into adding-simulate-ingest-api
masseyke Nov 15, 2023
ac6ccab
comment cleanup
masseyke Nov 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/101409.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 101409
summary: Adding a simulate ingest api
area: Ingest Node
type: feature
issues: []
1 change: 1 addition & 0 deletions docs/reference/ingest/apis/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,4 @@ include::delete-pipeline.asciidoc[]
include::geoip-stats-api.asciidoc[]
include::get-pipeline.asciidoc[]
include::simulate-pipeline.asciidoc[]
include::simulate-ingest.asciidoc[]
358 changes: 358 additions & 0 deletions docs/reference/ingest/apis/simulate-ingest.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,358 @@

[[simulate-ingest-api]]
=== Simulate ingest API
++++
<titleabbrev>Simulate ingest</titleabbrev>
++++

Executes ingest pipelines against a set of provided documents, optionally

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this and in the doc for the existing simulate, I'm surprised there is no mention of the simulation/testing aspect. In isolation, this reads as if it actually executes the pipeline (and thus indexes some data).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a line below that reads No data is indexed into $Elasticsearch. I'll add a line about the intended use of this API at the top though.

with substitute pipeline definitions.

////
[source,console]
----
PUT /_ingest/pipeline/my-pipeline
{
"description" : "example pipeline to simulate",
"processors": [
{
"set" : {
"field" : "field1",
"value" : "value1"
}
}
]
}

PUT /_ingest/pipeline/my-final-pipeline
{
"description" : "example final pipeline to simulate",
"processors": [
{
"set" : {
"field" : "field2",
"value" : "value2"
}
}
]
}

PUT /index
{
"settings": {
"index": {
"default_pipeline": "my-pipeline",
"final_pipeline": "my-final-pipeline"
}
}
}
----
// TESTSETUP
////

[source,console]
----
POST /_ingest/_simulate
{
"docs": [
{
"_index": "index",
masseyke marked this conversation as resolved.
Show resolved Hide resolved
"_id": "id",
"_source": {
"foo": "bar"
}
},
{
"_index": "index",
"_id": "id",
"_source": {
"foo": "rab"
}
}
],
"pipeline_substitutions": { <1>
"my-pipeline": {
"processors": [
{
"set": {
"field": "field3",
"value": "value3"
}
}
]
}
}
}
----

<1> This replaces the existing `my-pipeline` pipeline with the contents given here for the duration of this request.

[[simulate-ingest-api-request]]
==== {api-request-title}

`POST /_ingest/_simulate`

`GET /_ingest/_simulate`

`POST /_ingest/<target>/_simulate`

`GET /_ingest/<target>/_simulate`

[[simulate-ingest-api-prereqs]]
==== {api-prereq-title}

* If the {es} {security-features} are enabled, you must have the
`index` or `create` <<privileges-list-indices,index privileges>>
to use this API.

[[simulate-ingest-api-desc]]
==== {api-description-title}

The simulate ingest API simulates ingesting data into an index. It
executes the default and final pipeline for that index against a set
of documents provided in the body of the request. If a pipeline
contains a reroute processor, it follows that reroute processor to the
masseyke marked this conversation as resolved.
Show resolved Hide resolved
new index, executing that index's pipelines as well the same way that
a non-simulated ingest would. No data is indexed into ${es}. Instead,
masseyke marked this conversation as resolved.
Show resolved Hide resolved
the transformed document is returned, along with the list of pipelines
that have been executed and the name of the index where the document
would have been indexed if this were not a simulation. This differs from
the <<simulate-pipeline-api,simulate pipeline API>> in that you sepcify
masseyke marked this conversation as resolved.
Show resolved Hide resolved
a single pipeline for that API, and it only runs that one pipeline. The
simulate pipeline API is more useful for developing a single pipeline,
while the simulate ingest API is more useful for troubleshooting the
interaction of the various pipelines that get applied when ingeseting
masseyke marked this conversation as resolved.
Show resolved Hide resolved
into an index.


By default, the pipeline definitions that are currently in the system
are used. But you can supply substitute pipeline definitions in the
masseyke marked this conversation as resolved.
Show resolved Hide resolved
body of the request. These will be used in place of the pipeline
definitions that are already in the system. This can be used to replace
existing pipeline definitions or to create new ones. The pipeline
substitutions are only used within this request.

[[simulate-ingest-api-path-params]]
==== {api-path-parms-title}

`<target>`::
(Optional, string)
The index to simulate ingesting into. This can be overridden by specifying an index
on each document. If you provide a <target> in the request path, it is used for any
documents that don’t explicitly specify an index argument.

[[simulate-ingest-api-query-params]]
==== {api-query-parms-title}

`pipeline`::
(Optional, string)
Pipeline to use as the default pipeline. This can be used to override the default pipeline
of the index being ingested into.


[role="child_attributes"]
[[simulate-ingest-api-request-body]]
==== {api-request-body-title}

`docs`::
(Required, array of objects)
Sample documents to test in the pipeline.
+
.Properties of `docs` objects
[%collapsible%open]
====
`_id`::
(Optional, string)
Unique identifier for the document.

`_index`::
(Optional, string)
Name of the index that the document will be ingested into.

`_source`::
(Required, object)
JSON body for the document.
====

`pipeline_substitutions`::
(Optional, map of strings to objects)
Map of pipeline IDs to substitute pipeline definition objects.
+
.Properties of pipeline definition objects
[%collapsible%open]
====
include::put-pipeline.asciidoc[tag=pipeline-object]
====

[[simulate-ingest-api-example]]
==== {api-examples-title}


[[simulate-ingest-api-pre-existing-pipelines-ex]]
===== Use pre-existing pipeline definitions
In this example the index `index` has a default pipeline called `my-pipeline` and a final
pipeline called `my-final-pipeline`. Since both documents are being ingested into `index`,
both pipelines are executed using the pipeline definitions that are already in the system.

[source,console]
----
POST /_ingest/_simulate
{
"docs": [
{
"_index": "index",
"_id": "123",
"_source": {
"foo": "bar"
}
},
{
"_index": "index",
"_id": "456",
"_source": {
"foo": "rab"
}
}
]
}
----

The API returns the following response:

[source,console-result]
----
{
"docs": [
{
"doc": {
"_id": "123",
"_index": "index",
"_version": -3,
"_source": {
"field1": "value1",
"field2": "value2",
"foo": "bar"
},
"executed_pipelines": [
"my-pipeline",
"my-final-pipeline"
]
}
},
{
"doc": {
"_id": "456",
"_index": "index",
"_version": -3,
"_source": {
"field1": "value1",
"field2": "value2",
"foo": "rab"
},
"executed_pipelines": [
"my-pipeline",
"my-final-pipeline"
]
}
}
]
}
----

[[simulate-ingest-api-request-body-ex]]
===== Specify a pipeline substitution in the request body
In this example the index `index` has a default pipeline called `my-pipeline` and a final
pipeline called `my-final-pipeline`. But a substitute definition of `my-pipeline` is
provided in `pipeline_substitutions`. The substitute `my-pipeline` will be used in place of
the `my-pipeline` that is in the system, and then the `my-final-pipeline` that is already
defined in the system will be executed.

[source,console]
----
POST /_ingest/_simulate
{
"docs": [
{
"_index": "index",
"_id": "123",
"_source": {
"foo": "bar"
}
},
{
"_index": "index",
"_id": "456",
"_source": {
"foo": "rab"
}
}
],
"pipeline_substitutions": {
"my-pipeline": {
"processors": [
{
"uppercase": {
"field": "foo"
}
}
]
}
}
}
----

The API returns the following response:

[source,console-result]
----
{
"docs": [
{
"doc": {
"_id": "123",
"_index": "index",
"_version": -3,
"_source": {
"field2": "value2",
"foo": "BAR"
},
"executed_pipelines": [
"my-pipeline",
"my-final-pipeline"
]
}
},
{
"doc": {
"_id": "456",
"_index": "index",
"_version": -3,
"_source": {
"field2": "value2",
"foo": "RAB"
},
"executed_pipelines": [
"my-pipeline",
"my-final-pipeline"
]
}
}
]
}
----

////
[source,console]
----
DELETE /index

DELETE /_ingest/pipeline/*
----

[source,console-result]
----
{
"acknowledged": true
}
----
////
Loading