Proposal: Streaming results #12188

nik9000 · 2015-07-10T19:47:36Z

Maybe this is a crazy idea but I've spent some time working with another system (blazegraph) that supports sending large result sets over its API using HTTP 1.1's chunked encoding. The results stream back to the client and the client can close the tcp connection when it has enough results and the server stops producing results. I was wondering if it might make sense to do something similar to Elasticsearch. The advantage it'd have over scan/scroll is that its simpler to reason about when server side resources are in use - only as long as the tcp connection is open to the client.

I don't know enough about the overhead of scan/scroll to know if its worth doing. It doesn't solve the infinite scroll problem either - for that you need an efficient way for clients to poll deeply and this just isn't it.

jprante · 2015-07-10T22:16:04Z

Not sure how blocking chunked transfer can solve challenges like back pressure, but it should be possible to write RxJava https://github.com/ReactiveX/RxJava based code that implements reactive streams http://www.reactive-streams.org/ similar to http://mongodb.github.io/mongo-java-driver-reactivestreams/
This would be more easier if Observer pattern for actions and Java 8 lambdas could be used. Example of what can be done is "HTTP tail" for JVM https://github.com/myfreeweb/rxjava-http-tail

nik9000 · 2015-07-20T13:36:44Z

Not sure how blocking chunked transfer can solve challenges like back pressure

It kind of can. There isn't anything that I know of like the triple-ack of tcp but there are buffers and you could in theory check how full they are and only try to fill them when they get below a certain point.

clintongormley · 2016-01-18T20:57:08Z

@nik9000 is this still something you want to investigate?

nik9000 · 2016-01-18T21:37:10Z

@nik9000 is this still something you want to investigate?

I think its a neat idea and might be useful for something someday but it just doesn't have the crazy +1 train that some other proposals have accumulated. I'm going to close it. Maybe someone can revive it when they have some super awesome use case.

Honestly the flip side might be more useful: implement bulk indexing using chunked uploads. That has really simple back pressure on the uploading thread and would be simpler to implement. @mikemccand and I talked about it many months ago. The neat thing about it is that Elasticsearch can better manage its memory if the user is uploading using chunks - they can continue sending chunks until they want to make sure the translog has fsynced - then they send the last chunk and we consider the bulk request complete and run the fsync. Rather than having to load the whole bulk request we get to rely on tcp's back pressure to slow the client down so we can have as much of the bulk request "in flight" as we think is appropriate.

Its a neat idea but I dunno if its actually worth implementing.

mikemccand · 2016-01-18T23:33:41Z

implement bulk indexing using chunked uploads.

+1

The neat thing about it is that Elasticsearch can better manage its memory if the user is uploading using chunks

Manage its memory and also manage appropriate concurrency to bring to bear. Plus the client gets much simpler, not having to play games with proper item count per bulk request, how many client threads to use, dealing w/ rejected exceptions, etc.

@honzakral recently added some nice sugar to the ES python client APIs that does some of this for the user, so the user feels like they're using a single streaming bulk indexing API, and under the hood the Python ES client breaks it into chunks using N threads ...

Bargs · 2016-04-12T15:45:30Z

+1 on chunked uploads. This would be a huge benefit to the CSV upload functionality I'm building into Kibana. Right now I have to make educated guesses about what bulk size will be the best for the largest number of users, and it just won't be a good experience for some people. If ES supported chunked uploads the entire thing could be implemented as one big stream from the user's browser, to Kibana's node backend, to ES and back.

(elastic/kibana#6541 and elastic/kibana#6844)

clintongormley added discuss :Search/Search Search-related issues that do not fall into other categories labels Jan 18, 2016

nik9000 closed this as completed Jan 18, 2016

Bargs mentioned this issue Apr 12, 2016

[API] Add CSV bulk indexing support to Kibana API elastic/kibana#6844

Merged

joshfix mentioned this issue Dec 5, 2017

Feature Request: Implement Reactive Java Client #27679

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Streaming results #12188

Proposal: Streaming results #12188

nik9000 commented Jul 10, 2015

jprante commented Jul 10, 2015

nik9000 commented Jul 20, 2015

clintongormley commented Jan 18, 2016

nik9000 commented Jan 18, 2016

mikemccand commented Jan 18, 2016

Bargs commented Apr 12, 2016

Proposal: Streaming results #12188

Proposal: Streaming results #12188

Comments

nik9000 commented Jul 10, 2015

jprante commented Jul 10, 2015

nik9000 commented Jul 20, 2015

clintongormley commented Jan 18, 2016

nik9000 commented Jan 18, 2016

mikemccand commented Jan 18, 2016

Bargs commented Apr 12, 2016