Introduce reroute method on IngestDocument #94000

felixbarny · 2023-02-22T07:31:32Z

Overrides _index
Skips current pipeline
Invokes default pipeline of new index

Requirement for

Add reroute processor #76511

Combines #85932 and #85931

Fixes #83653

- Overrides _index - Skips current pipeline - Invokes default pipeline of new index

elasticsearchmachine · 2023-02-22T08:21:48Z

Pinging @elastic/es-data-management (Team:Data Management)

dakrone · 2023-02-22T21:39:25Z

server/src/main/java/org/elasticsearch/ingest/IngestService.java

+        return new Pipelines(pipelineId, finalPipelineId);
+    }
+
+    private static class Pipelines implements Iterable<String> {


I think this is a reasonable abstraction, but I don't like that we make it mutable, and that we don't encapsulate enough here.

The withoutDefaultPipeline method makes me uncomfortable as its name makes it sound like it would return a new object rather than making a mutable change.

I also think we don't need to have executePipelines(...) take a boolean, we're treating this internally as though we'll always have a list of pipelines, but we could probably get away with passing a proper object instead Iterator<String>. I think it'd be much clearer that way.

I abstracted the Iterator<String> and the boolean flag into a PipelineIterator, however it's not immutable. As that's the nature of iterators, I think that's fine.

The executePipelines needs to know about three properties of the current pipeline: the name (to add the name in the exception in case the pipeline itself can't be resolved), the pipeline itself, and whether the current pipeline is the final pipeline (if true, it's disallowed to override _index).

The PipelineIterator encapsulates that state and I think that makes it cleaner than before. Thanks for the suggestion 👍

dakrone · 2023-02-22T21:41:13Z

server/src/main/java/org/elasticsearch/ingest/IngestDocument.java

+    void resetPipelineSkipping() {
+        skipCurrentPipeline = false;
+    }


Why doesn't this also reset invokeDefaultPipelineOfDestination? Also, it's unclear from the code why we need this method to be invoked.

Good point, it needs to reset both. I've changed it to just be a single boolean.

dakrone · 2023-02-22T21:42:44Z

server/src/main/java/org/elasticsearch/ingest/IngestDocument.java

+    boolean isInvokeDefaultPipelineOfDestination() {
+        return invokeDefaultPipelineOfDestination;
+    }
+
+    boolean isSkipCurrentPipeline() {
+        return skipCurrentPipeline;
+    }


I don't think we need multiple boolean flags for this right? If we do, then it's not clear enough why they have to be separated (needs documentation)

I've changed it to just be a single boolean and a single getter for it (isRedirect()). The downside is that isRedirect() is less expressive as isSkipCurrentPipeline() in the context of determining whether to skip the pipeline.
We could keep isSkipCurrentPipeline() and just return the redirect flag. But that feels a bit off as well.

joegallo · 2023-03-07T13:48:51Z

I'm pretty sure this is going to fail the test introduced by #94281, which means that this PR is a bugfix for #83653. /cc @HiDAl

edit: added a commit and updated the description

double edit: there might be a BWC issue with the test -- mixed mode clusters wouldn't always have the bugfix and all that. If that's the case, I'll take on the task of adding the right limitations to the test to make it version-specific. (edit: there's no bwc tests yaml tests in modules/ingest-common, at least as far as I can see.)

Since this PR fixes the linked bug

If a final pipeline changes the indices such that a cycle is created, it's more important to error that a final pipeline changed the indices than that a cycle was created.

HiDAl · 2023-03-08T13:31:41Z

Yesterday I had to revert the test because it was failing due to warning headers. Today I pushed and merged PR #94388 which addresses the issue. Please rebase/merge on this PR @joegallo @felixbarny

felixbarny · 2023-03-08T18:04:06Z

I've updated to main and re-adjusted the test. I ran that test locally and I can confirm that it prevents the final pipeline from running twice.

felixbarny · 2023-03-09T09:36:34Z

server/src/main/java/org/elasticsearch/ingest/IngestDocument.java

@@ -903,6 +904,29 @@ public String toString() {
        return "IngestDocument{" + " sourceAndMetadata=" + ctxMap + ", ingestMetadata=" + ingestMetadata + '}';
    }

+    public void reroute(String destIndex) {
+        getMetadata().setIndex(destIndex);


@joegallo what do you think bout adding the history of the _index field in an ingest metadata field? This wouldn't be indexed by default but in order to debug, users can use a set processor to add this to the documents:

{ "set": { "field": "reroute_history", "copy_from": "_ingest.reroute_history" } }

Suggested change

getMetadata().setIndex(destIndex);

getMetadata().setIndex(destIndex);

appendFieldValue("_ingest.reroute_history", getMetadata().getIndex());

I'm not opposed to adding a mechanism like that in a future PR, but I would like to keep the scope of this PR fixed.

When we do add that mechanism, though, I'd prefer that the list be an immutable reference to the collection we're tracking for index recursion purposes, rather than a new collection. Similarly, appendFieldValue is more for processors to use when the first argument is customer-provided -- we can just traverse the data structures ourselves in IngestService, there's no need for string parsing and evaluation there.

joegallo · 2023-03-21T10:51:18Z

@elasticmachine update branch

joegallo · 2023-03-21T14:33:26Z

I added some new commits this morning that address my biggest issues here -- but the failure we see from elasticsearch-ci/part-1 seems to be real (and so would have its root in my commits). I'll add another commit here and get that passing again. ☹️ (edit: fixed 2db2a19)

I was waiting for green CI before I added the following comment, but since CI is going to be red for a little while I'll just drop my message in now:

I'm not especially enamored with the way this tracks indexRecursionDetection -- there's a somewhat similar executedPipelines that's being tracked by IngestDocument itself. Having IngestDocument track the index changes and reroutes like that would simplify the parameters to executePipelines, and I think it would make later changes like adding the index history a bit simpler.

That said -- this is good enough as it is. I'll take on moving indexRecursionDetection over to IngestDocument as a refactoring in a separate PR. Let's merge this and start iterating on #76511.

But make the null handling explicit rather than implicit

felixbarny · 2023-03-22T06:23:43Z

I'm not especially enamored with the way this tracks indexRecursionDetection -- there's a somewhat similar executedPipelines that's being tracked by IngestDocument itself. Having IngestDocument track the index changes and reroutes like that would simplify the parameters to executePipelines, and I think it would make later changes like adding the index history a bit simpler.

I completely agree with that. Ideally, ingest pipelines and scripts should not be able to modify the history so that we can be certain whether a document is sent directly to a target or whether it has been rerouted.

I'll take on moving indexRecursionDetection over to IngestDocument as a refactoring in a separate PR.

Thanks!

Let's merge this and start iterating on #76511.

Thanks for the review and the approval! I'll wait for @dakrone's approval before merging.

dakrone

LGTM also, thanks for all the work on this Felix & Joe

Logstash's Integration filter works directly with the processors, but cannot use the IngestService that is tightly-coupled with cluster state and must therefore emulate the behavior introduced in elastic#94000. To do so, the additional methods for inquiring about and resetting the reroute state need to be externally-accessible.

Logstash's Integration filter works directly with the processors, but cannot use the IngestService that is tightly-coupled with cluster state and must therefore emulate the behavior introduced in elastic#94000. To do so, the additional methods for inquiring about and resetting the reroute state need to be externally-accessible. Exposing them through a clearly-named bridge allows us to avoid making these Elastic-internal bits a part of the public APIs that are subject to years-long stability and deprecation notice policies.

@APinote

…96958) * ingest: expose reroute inquiry/reset via Elastic-internal API bridge Logstash's Integration filter works directly with the processors, but cannot use the IngestService that is tightly-coupled with cluster state and must therefore emulate the behavior introduced in #94000. To do so, the additional methods for inquiring about and resetting the reroute state need to be externally-accessible. Exposing them through a clearly-named bridge allows us to avoid making these Elastic-internal bits a part of the public APIs that are subject to years-long stability and deprecation notice policies. * Update docs/changelog/96958.yaml * javadoc: rephrase to avoid use of @APinote

Introduce redirect method on IngestDocument

a66a11e

- Overrides _index - Skips current pipeline - Invokes default pipeline of new index

felixbarny requested review from dakrone and joegallo February 22, 2023 07:31

elasticsearchmachine added v8.8.0 external-contributor Pull request authored by a developer outside the Elasticsearch team needs:triage Requires assignment of a team area label labels Feb 22, 2023

felixbarny added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement labels Feb 22, 2023

elasticsearchmachine added Team:Data Management Meta label for data/management team and removed needs:triage Requires assignment of a team area label labels Feb 22, 2023

felixbarny added 2 commits February 22, 2023 09:24

Add changelog

0a41d94

Skipp full pipeline even if invoked via pipeline processor

47581d5

felixbarny mentioned this pull request Feb 22, 2023

Add reroute processor #76511

Merged

dakrone reviewed Feb 22, 2023

View reviewed changes

felixbarny added 7 commits February 23, 2023 09:49

Encapsulate more state in PipelineIterator

fc34a28

Only one boolean flag in IngestDocument

b1c7b26

Reset redirect at the end of the handler

d766c62

Apply spotless suggestions

5dd4d27

Rename method and add javadoc

3b64727

Reroute to remain

1bfcf55

Add test that final pipeline can't reroute

98d7c94

felixbarny changed the title ~~Introduce redirect method on IngestDocument~~ Introduce reroute method on IngestDocument Mar 1, 2023

This was referenced Mar 1, 2023

Add ability to skip the rest of the processors in a pipeline #85932

Closed

Invoke default pipeline of new index #85931

Closed

Merge branch 'main' into ingest-document-redirect

9e5df7e

Update test

8ef09c0

Since this PR fixes the linked bug

joegallo added 3 commits March 7, 2023 16:27

Reorder these blocks

61e5617

If a final pipeline changes the indices such that a cycle is created, it's more important to error that a final pipeline changed the indices than that a cycle was created.

Add/tweak comments

35983d5

Add more context to error message

5dfebb6

felixbarny added 2 commits March 8, 2023 16:54

Merge remote-tracking branch 'origin/main' into ingest-document-redirect

5dac7c2

Adjust test to assert that final pipeline is not executed twice

2dc42e1

felixbarny commented Mar 9, 2023

View reviewed changes

elasticmachine and others added 4 commits March 21, 2023 06:51

Merge branch 'main' into ingest-document-redirect

f3a2ad6

Merge branch 'main' into ingest-document-redirect

25e4a78

Make PipelineIterator an iterator over a triple

8dbef75

Rename this method and add a docstring

53f0b24

joegallo added 2 commits March 21, 2023 10:42

The final pipeline slot should be isFinal, of course

2db2a19

Merge branch 'main' into ingest-document-redirect

83112f8

joegallo approved these changes Mar 21, 2023

View reviewed changes

joegallo requested a review from dakrone March 21, 2023 14:50

Actually, getPipeline is all we need here

e86579e

But make the null handling explicit rather than implicit

joegallo mentioned this pull request Mar 21, 2023

IngestService code cleanups #94593

Merged

dakrone approved these changes Mar 22, 2023

View reviewed changes

felixbarny merged commit cdf2522 into elastic:main Mar 22, 2023

joegallo mentioned this pull request Apr 18, 2023

Refactor IngestDocument reroute recursion detection #95350

Merged

yaauie mentioned this pull request Jun 19, 2023

ingest pipeline reroute: allow external inspection/reset of state #96934

Closed

yaauie mentioned this pull request Jun 20, 2023

ingest: expose reroute inquiry/reset via Elastic-internal API bridge #96958

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce reroute method on IngestDocument #94000

Introduce reroute method on IngestDocument #94000

felixbarny commented Feb 22, 2023 •

edited by joegallo

Loading

elasticsearchmachine commented Feb 22, 2023

dakrone Feb 22, 2023

felixbarny Feb 23, 2023

dakrone Feb 22, 2023

felixbarny Feb 23, 2023

dakrone Feb 22, 2023

felixbarny Feb 23, 2023

joegallo commented Mar 7, 2023 •

edited

Loading

HiDAl commented Mar 8, 2023

felixbarny commented Mar 8, 2023

felixbarny Mar 9, 2023 •

edited

Loading

joegallo Mar 21, 2023

joegallo commented Mar 21, 2023

joegallo commented Mar 21, 2023 •

edited

Loading

felixbarny commented Mar 22, 2023

dakrone left a comment

	getMetadata().setIndex(destIndex);
	getMetadata().setIndex(destIndex);
	appendFieldValue("_ingest.reroute_history", getMetadata().getIndex());

Introduce reroute method on IngestDocument #94000

Introduce reroute method on IngestDocument #94000

Conversation

felixbarny commented Feb 22, 2023 • edited by joegallo Loading

elasticsearchmachine commented Feb 22, 2023

dakrone Feb 22, 2023

Choose a reason for hiding this comment

felixbarny Feb 23, 2023

Choose a reason for hiding this comment

dakrone Feb 22, 2023

Choose a reason for hiding this comment

felixbarny Feb 23, 2023

Choose a reason for hiding this comment

dakrone Feb 22, 2023

Choose a reason for hiding this comment

felixbarny Feb 23, 2023

Choose a reason for hiding this comment

joegallo commented Mar 7, 2023 • edited Loading

HiDAl commented Mar 8, 2023

felixbarny commented Mar 8, 2023

felixbarny Mar 9, 2023 • edited Loading

Choose a reason for hiding this comment

joegallo Mar 21, 2023

Choose a reason for hiding this comment

joegallo commented Mar 21, 2023

joegallo commented Mar 21, 2023 • edited Loading

felixbarny commented Mar 22, 2023

dakrone left a comment

Choose a reason for hiding this comment

felixbarny commented Feb 22, 2023 •

edited by joegallo

Loading

joegallo commented Mar 7, 2023 •

edited

Loading

felixbarny Mar 9, 2023 •

edited

Loading

joegallo commented Mar 21, 2023 •

edited

Loading