-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce reroute method on IngestDocument #94000
Introduce reroute method on IngestDocument #94000
Conversation
- Overrides _index - Skips current pipeline - Invokes default pipeline of new index
Pinging @elastic/es-data-management (Team:Data Management) |
return new Pipelines(pipelineId, finalPipelineId); | ||
} | ||
|
||
private static class Pipelines implements Iterable<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a reasonable abstraction, but I don't like that we make it mutable, and that we don't encapsulate enough here.
The withoutDefaultPipeline
method makes me uncomfortable as its name makes it sound like it would return a new object rather than making a mutable change.
I also think we don't need to have executePipelines(...)
take a boolean, we're treating this internally as though we'll always have a list of pipelines, but we could probably get away with passing a proper object instead Iterator<String>
. I think it'd be much clearer that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I abstracted the Iterator<String>
and the boolean flag into a PipelineIterator
, however it's not immutable. As that's the nature of iterators, I think that's fine.
The executePipelines
needs to know about three properties of the current pipeline: the name (to add the name in the exception in case the pipeline itself can't be resolved), the pipeline itself, and whether the current pipeline is the final pipeline (if true, it's disallowed to override _index).
The PipelineIterator
encapsulates that state and I think that makes it cleaner than before. Thanks for the suggestion 👍
void resetPipelineSkipping() { | ||
skipCurrentPipeline = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't this also reset invokeDefaultPipelineOfDestination
? Also, it's unclear from the code why we need this method to be invoked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, it needs to reset both. I've changed it to just be a single boolean.
boolean isInvokeDefaultPipelineOfDestination() { | ||
return invokeDefaultPipelineOfDestination; | ||
} | ||
|
||
boolean isSkipCurrentPipeline() { | ||
return skipCurrentPipeline; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need multiple boolean flags for this right? If we do, then it's not clear enough why they have to be separated (needs documentation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to just be a single boolean and a single getter for it (isRedirect()
). The downside is that isRedirect()
is less expressive as isSkipCurrentPipeline()
in the context of determining whether to skip the pipeline.
We could keep isSkipCurrentPipeline()
and just return the redirect
flag. But that feels a bit off as well.
I'm pretty sure this is going to fail the test introduced by #94281, which means that this PR is a bugfix for #83653. /cc @HiDAl edit: added a commit and updated the description
|
Since this PR fixes the linked bug
If a final pipeline changes the indices such that a cycle is created, it's more important to error that a final pipeline changed the indices than that a cycle was created.
Yesterday I had to revert the test because it was failing due to warning headers. Today I pushed and merged PR #94388 which addresses the issue. Please rebase/merge on this PR @joegallo @felixbarny |
I've updated to main and re-adjusted the test. I ran that test locally and I can confirm that it prevents the final pipeline from running twice. |
@@ -903,6 +904,29 @@ public String toString() { | |||
return "IngestDocument{" + " sourceAndMetadata=" + ctxMap + ", ingestMetadata=" + ingestMetadata + '}'; | |||
} | |||
|
|||
public void reroute(String destIndex) { | |||
getMetadata().setIndex(destIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joegallo what do you think bout adding the history of the _index field in an ingest metadata field? This wouldn't be indexed by default but in order to debug, users can use a set
processor to add this to the documents:
{
"set": {
"field": "reroute_history",
"copy_from": "_ingest.reroute_history"
}
}
getMetadata().setIndex(destIndex); | |
getMetadata().setIndex(destIndex); | |
appendFieldValue("_ingest.reroute_history", getMetadata().getIndex()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to adding a mechanism like that in a future PR, but I would like to keep the scope of this PR fixed.
When we do add that mechanism, though, I'd prefer that the list be an immutable reference to the collection we're tracking for index recursion purposes, rather than a new collection. Similarly, appendFieldValue
is more for processors to use when the first argument is customer-provided -- we can just traverse the data structures ourselves in IngestService
, there's no need for string parsing and evaluation there.
@elasticmachine update branch |
I added some new commits this morning that address my biggest issues here -- but the failure we see from I was waiting for green CI before I added the following comment, but since CI is going to be red for a little while I'll just drop my message in now: I'm not especially enamored with the way this tracks That said -- this is good enough as it is. I'll take on moving |
But make the null handling explicit rather than implicit
I completely agree with that. Ideally, ingest pipelines and scripts should not be able to modify the history so that we can be certain whether a document is sent directly to a target or whether it has been rerouted.
Thanks!
Thanks for the review and the approval! I'll wait for @dakrone's approval before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM also, thanks for all the work on this Felix & Joe
Logstash's Integration filter works directly with the processors, but cannot use the IngestService that is tightly-coupled with cluster state and must therefore emulate the behavior introduced in elastic#94000. To do so, the additional methods for inquiring about and resetting the reroute state need to be externally-accessible.
Logstash's Integration filter works directly with the processors, but cannot use the IngestService that is tightly-coupled with cluster state and must therefore emulate the behavior introduced in elastic#94000. To do so, the additional methods for inquiring about and resetting the reroute state need to be externally-accessible. Exposing them through a clearly-named bridge allows us to avoid making these Elastic-internal bits a part of the public APIs that are subject to years-long stability and deprecation notice policies.
…96958) * ingest: expose reroute inquiry/reset via Elastic-internal API bridge Logstash's Integration filter works directly with the processors, but cannot use the IngestService that is tightly-coupled with cluster state and must therefore emulate the behavior introduced in #94000. To do so, the additional methods for inquiring about and resetting the reroute state need to be externally-accessible. Exposing them through a clearly-named bridge allows us to avoid making these Elastic-internal bits a part of the public APIs that are subject to years-long stability and deprecation notice policies. * Update docs/changelog/96958.yaml * javadoc: rephrase to avoid use of @APinote
Requirement for
Combines #85932 and #85931
Fixes #83653