[POC] Streaming Indexing API #5001

adnapibar · 2022-10-31T18:45:27Z

Problem

Current _bulk indexing API places a high configuration burden on users today to avoid RejectedExecutionException due to TOO_MANY_REQUESTS. This forces the user to "experiment" with bulk block sizes, multi-threading, refresh intervals, etc.

The use HTTP streaming for _bulk indexing would:

improve API usability: streams for request and response
improve resource utilization: the coordinators may funnel the streams from multiple clients
improve overall stability: the coordinators may use backpressure to slow down the clients and apply the optimal batching strategy taking into account resource availability (heap / CPU / ...)
improve durability: the coordinators may start processing as soon the the first bulk item is received (using translog / other means to deal with crashes / restarts / disconnects)

See please [RFC] Streaming Index API

Implementation Options

With all the options available, the _bulk should continue to use HTTP protocol, however there are few options to consider.

Chunked Transfer Encoding

More details here #3000 (comment). This is the more or less the only option available in case of HTTP/1.1. The benefit of this implementation is that it would work for 2.x and 3.x releases.

HTTP/2

HTTP/2 offers an optimized transport for HTTP semantics, including superior streaming capabilities, see please Streams and Multiplexing for more details.

HTTP/2 uses DATA frames to carry message payloads. The "chunked"
transfer encoding defined in Section 4.1 of [RFC7230] MUST NOT be
used in HTTP/2.

The HTTP/2 is only supported by 3.x release line (both for clients and servers).

Websockets

The Websockets would offer bidirectional stream, similarly to HTTP/2, but from implementation perspective it would be easier to integrate (in theory): this is new protocol that will not touch the existing OpenSearch HTTP layer.

Implementation Notes

The OpenSearch supports both HTTP/1.1 and HTTP/2 (including H2C). However, the OpenSearch HTTP server model does not support chunked transfer encoding nor exposes HTTP/2 streams (especially data frames):

the OpenSearch HTTP layer always expects complete requests (and sends complete responses)
the OpenSearch HTTP layer is based on Netty's HTTP/1.1 abstractions
the OpenSearch HTTP/2 uses Netty's conversions (fe Http2StreamFrameToHttpObjectCodec, ...) to convert to HTTP/1.1 abstractions

The suggested direction to proceed towards POC:

prototype streaming within OpenSearch HTTP layer (Chunked Transfer Encoding first), both client and server
understand Netty's conversions between HTTP/1.1 and HTTP/2 when chunked transfer encoding is used (if any)
understand if support streaming with explicit HTTP/2 data streams handling if required
conclude with implementation to move forward

At this moment, the POC focuses only on first step: understand the scope of changes to support HTTP streaming on OpenSearch server and client sides.

The text was updated successfully, but these errors were encountered:

reta · 2023-08-02T14:52:27Z

Closing the POC, the prove of the concept has been developed, the implementation path had been cleared out

adnapibar added the enhancement Enhancement or improvement to existing feature or request label Oct 31, 2022

tlfeng assigned adnapibar Oct 31, 2022

VachaShah added the Indexing & Search label Nov 1, 2022

adnapibar removed their assignment Mar 11, 2023

anasalkouz added Migration:In Progress and removed Migration:In Progress labels Mar 17, 2023

anasalkouz added this to Streaming Index API Apr 4, 2023

github-project-automation bot moved this to Todo in Streaming Index API Apr 4, 2023

reta self-assigned this Apr 13, 2023

reta mentioned this issue May 1, 2023

[POC] Streaming Indexing API #7273

Closed

6 tasks

This was referenced Aug 2, 2023

[META] Streaming Indexing API #9065

Open

[RFC] Streaming Index API #3000

Open

reta closed this as completed Aug 2, 2023

github-project-automation bot moved this from Todo to Done in Streaming Index API Aug 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[POC] Streaming Indexing API #5001

[POC] Streaming Indexing API #5001

adnapibar commented Oct 31, 2022 •

edited by reta

Loading

reta commented Aug 2, 2023

[POC] Streaming Indexing API #5001

[POC] Streaming Indexing API #5001

Comments

adnapibar commented Oct 31, 2022 • edited by reta Loading

Problem

Implementation Options

Chunked Transfer Encoding

HTTP/2

Websockets

Implementation Notes

reta commented Aug 2, 2023

adnapibar commented Oct 31, 2022 •

edited by reta

Loading