-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[POC] Streaming Indexing API #7273
Conversation
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Look forward to seeing more progress in these space, seems like there could be some great performance boosts.
} | ||
} | ||
|
||
private class StreamingRequestConsumer<T extends HttpContent> implements Consumer<T>, Publisher<HttpContent> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The security plugin works by intercepting requests through the transport layer, bulk requests are skipped since there individual index requests are fanned out [1]. I suspect this will work correctly for streaming requests assuming the requests are still fanned out via transport actions - if they aren't a new hook will be needed. If a new hook is needed, it could be added by using the IdentityService [2] to get the subject and then perform the permissions check.
After this is merged into main, but before its backported, I would highly recommend adding a new test within the security plugin to exercise the Streaming API an ensure permissions conventions are not side-stepped.
- [1] https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/privileges/PrivilegesEvaluator.java#L259-L262
- [2] https://github.com/opensearch-project/OpenSearch/pull/7246/files#diff-aac303dafc3ff8f2c94c4913c62dc5f57e3ee1c3dd2633188a956da68697fa52
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the early feedback, @peternied , highly appreciated!
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @reta for starting this initiative.
Do we plan to support CompressionStream for compressing/decompressing streams of data using the gzip/deflate formats?
There might be other follow up changes needed in the engine either to forward the stream directly to a single shard in the data node(optimisation) and refresh the engine on stream close. Further we can evaluate how indexing performance can further be improved by getting rid of translogs altogether.
Thanks @Bukhtawar
This is just POC to understand the scope of changes needed to support the streaming part for request and response, the POC won't go beyond that (but the issue it references has it all described)
The [1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding |
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
53d3a77
to
e803333
Compare
Nice. We should talk about whether
I think this is a good project-wide upgrade. I would open an issue to track.
As always I'd start with feature flag/experimental, but that's more of a timeline problem.
We break downstream plugins all the time, so while this is painful, I think it's something we can manage.
👍
👍
This may be a dumb question as I don't know much about streaming APIs, but would supporting other streaming implementations/protocols like grpc mitigate this kind of problems?
Love it. ❤️ |
@reta I think these points all fall under the broader issue of the project wide upgrade to Reactor Netty. I agree with @dblock that these are things that we can manage, and provided the REST API is functionally identical we should be able to work through dependency issues and get this released in 2.x. A couple questions though:
Overall this looks great! |
Thanks a lot @dblock and @andrross !
@dblock with grpc, we need HTTP/2 and basically move away from REST actions, requiring support of a whole new client ecosystem (to deal with grpc streams).
@andrross there used to be https://github.com/ReactiveX/RxNetty that sadly is dead, and https://github.com/playframework/netty-reactive-streams, which is alive. The good thing about Project Reactor is that it has very reach streaming capabilities (based on Project Reactor) and we also already bundled it in
No, Netty could be updated separately any time (the |
Thanks @reta! I think I'm onboard with the upgrade to Reactor Netty. A couple questions about the API:
|
Thanks @andrross !
I believe it relies on standard TCP flow control, the Netty / Reactor Netty buffers data but if the consumer (server in this case) is not able to consume it fast enough, at some point Netty / Reactor Netty stops reading the data from socket, which lead not no ACKs to the client.
This is not explored in the scope of the POC as it is more like "implementation detail". At the moment client gets the failures as it consumes the response stream and could stop any time. |
With the proposed approach I think we're committing to the reactive streams specification for the API, as that is what Project Reactor is built on. This makes sense to me as reactive streams is meant to be a standard for asynchronous stream processing, though I am not an expert in this space. Is there any more due diligence we need to do to ensure we can get the semantics we want for this API? |
Thanks @andrross
👍 , I agree with your statement here
At this point, I don't see any issues (semantic wise or feature wise) we may run into with this API specifically, the most complicated part (as for this POC) were to explore how could we get it in (streaming, essentially, from network to core) without disrupting everything else. |
Thanks @reta, I'm on board with creating the meta issue and starting the incremental steps! @nknize @dblock @Bukhtawar Any additional thoughts or concerns? |
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Andriy Redko <[email protected]>
Gradle Check (Jenkins) Run Completed with:
|
Closing the POC and moving towards implementation #9065 |
Description
Streaming Indexing API (work in progress)
Issues Resolved
Closes #5001
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.