-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvement: reuse the tcp connection when plan files #522
Comments
I think the buffering is handles by opendal? cc @Xuanwo |
Hi, OpenDAL handles those connection-related tasks (by reqwest). Currently, FileIO builds new operators every time: iceberg-rust/crates/iceberg/src/io/storage.rs Lines 104 to 119 in 4083f81
I think we can improve this by using the same HTTP client instead. |
Great, Looking forward to it! |
Thanks for the sharing. I believe this can be addressed by reusing the same http client. |
We will need apache/opendal#4967 for this. I'm working on it now. |
Which opendal's version do we need to bump into? |
I'm guessing it will be included in our next release |
I have already verified that using a global HTTP client works in this issue and the performance improvement is impressive (about 10 times faster: from 1500+ms to 150+ms). |
Also, cc @sdd, who is focusing on the iceberg benchmark now. |
I also have some local code that reuses the same OpenDAL operator rather than creating a new one each time. I'd not submitted it yet as I wasn't sure of the validity of doing that in every possible scenario but it has been working well for me |
Great! I will address apache/opendal#4967 first, then consider the best approach for iceberg-rust. |
Considering I am trying to read an iceberg table from S3. Currently,
plan_files()
seems unable to reuse the TCP connection for HTTP requests. It will lead to a relatively high latency. I am not sure whether it is a good practice to enableConnection: keep-alive
by default, what do you think?The text was updated successfully, but these errors were encountered: