-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Streaming results #12188
Comments
Not sure how blocking chunked transfer can solve challenges like back pressure, but it should be possible to write RxJava https://github.com/ReactiveX/RxJava based code that implements reactive streams http://www.reactive-streams.org/ similar to http://mongodb.github.io/mongo-java-driver-reactivestreams/ |
It kind of can. There isn't anything that I know of like the triple-ack of tcp but there are buffers and you could in theory check how full they are and only try to fill them when they get below a certain point. |
@nik9000 is this still something you want to investigate? |
I think its a neat idea and might be useful for something someday but it just doesn't have the crazy +1 train that some other proposals have accumulated. I'm going to close it. Maybe someone can revive it when they have some super awesome use case. Honestly the flip side might be more useful: implement bulk indexing using chunked uploads. That has really simple back pressure on the uploading thread and would be simpler to implement. @mikemccand and I talked about it many months ago. The neat thing about it is that Elasticsearch can better manage its memory if the user is uploading using chunks - they can continue sending chunks until they want to make sure the translog has fsynced - then they send the last chunk and we consider the bulk request complete and run the fsync. Rather than having to load the whole bulk request we get to rely on tcp's back pressure to slow the client down so we can have as much of the bulk request "in flight" as we think is appropriate. Its a neat idea but I dunno if its actually worth implementing. |
+1
Manage its memory and also manage appropriate concurrency to bring to bear. Plus the client gets much simpler, not having to play games with proper item count per bulk request, how many client threads to use, dealing w/ rejected exceptions, etc. @honzakral recently added some nice sugar to the ES python client APIs that does some of this for the user, so the user feels like they're using a single streaming bulk indexing API, and under the hood the Python ES client breaks it into chunks using N threads ... |
+1 on chunked uploads. This would be a huge benefit to the CSV upload functionality I'm building into Kibana. Right now I have to make educated guesses about what bulk size will be the best for the largest number of users, and it just won't be a good experience for some people. If ES supported chunked uploads the entire thing could be implemented as one big stream from the user's browser, to Kibana's node backend, to ES and back. |
Maybe this is a crazy idea but I've spent some time working with another system (blazegraph) that supports sending large result sets over its API using HTTP 1.1's chunked encoding. The results stream back to the client and the client can close the tcp connection when it has enough results and the server stops producing results. I was wondering if it might make sense to do something similar to Elasticsearch. The advantage it'd have over scan/scroll is that its simpler to reason about when server side resources are in use - only as long as the tcp connection is open to the client.
I don't know enough about the overhead of scan/scroll to know if its worth doing. It doesn't solve the infinite scroll problem either - for that you need an efficient way for clients to poll deeply and this just isn't it.
The text was updated successfully, but these errors were encountered: