-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(functions): Add support for REST based remote functions #10911
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
b88d136
to
b85e0e6
Compare
0cd4510
to
74023dc
Compare
Pretty cool! I see the PR is still as draft, but I can help review when it's ready. Would also be nice to add some documentation on how to use it, the configs parameters, etc. |
abe87e1
to
6c1606e
Compare
05115f4
to
2ffec26
Compare
set(cpr_SOURCE BUNDLED) | ||
velox_resolve_dependency(cpr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f69837a
to
11c7ad2
Compare
32b272b
to
322d042
Compare
a312e97
to
c8e8424
Compare
// Because location_ is a variant, we must get the string: | ||
const auto& url = boost::get<std::string>(location_); | ||
const std::string fullUrl = fmt::format( | ||
"{}/v1/functions/{}/{}/{}/{}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This REST API makes an assumption about the endpoint. The backend side, is that specified somewhere and would we need to mention what this is as this must be specific to a service that is targeted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest server is following this presto rest function server, Can you help me guide where to document this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This remote function is tied to a Presto rest function server. So this isn't really generic from a Velox perspective. Velox could be used with Gluten among other engines.
Can you make the fullUrl a configuration parameter of the Remote function ?
#include <folly/init/Init.h> | ||
#include <gmock/gmock.h> | ||
#include <gtest/gtest.h> | ||
#include <cstdio> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a system include it should come before boost. This seems to not have changed yet.
std::unique_ptr<std::thread> serverThread_; | ||
boost::asio::io_context ioc_{1}; | ||
|
||
std::string location_{("http://127.0.0.1:8321")}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hard coding the port we can get free ports for use in the test.
See function getFreePorts
in the test utils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, updated the code accordingly
velox_memory | ||
velox_functions_prestosql) | ||
|
||
add_executable(velox_functions_remote_server_rest_main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this executable?
The test uses a separate thread to create the listener. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct there is not actual need for this executable unless someone want to run a local rest server with buildin velox function registered. I have removed the executable. I was just trying to keep the rest based remote functions as similar to the thrift implementation
|
||
DEFINE_string(service_host, "127.0.0.1", "Host to bind the service to"); | ||
|
||
DEFINE_int32(service_port, 8321, "Port to bind the service to"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need this we can pick a free port to use based on the port utils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this code.
@@ -71,6 +112,50 @@ class RemoteFunction : public exec::VectorFunction { | |||
} | |||
|
|||
private: | |||
void applyRestRemote( | |||
const SelectivityVector& rows, | |||
std::vector<VectorPtr>& args, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const& for args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, updated the code accordingly
|
||
// Because location_ is a variant, we must get the string: | ||
const auto& url = boost::get<std::string>(location_); | ||
const std::string fullUrl = fmt::format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do fullUrl setup once during construction instead of for each apply ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under the current design, it isn’t feasible to construct the fullUrl once in the constructor and reuse it. The fullUrl encapsulates not just the base path, but also function-specific details that can change over time. By hard-coding it in the constructor, we lose the ability to dynamically add or modify functions. Keeping the fullUrl generation flexible ensures it can adapt to new or updated functions without requiring changes to the constructor logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... So then it means that the RemoteFunction isn't really tied to a particular function but can be used for any function ? Please document that. Then there isn't a need for functionName_ member variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think the Remote function is designed this way. The Thrift part is assuming the functionName is fixed in member variable functionName_.
The RemoteFunction is a VectorFunction meaning that it is invoked in the context of an expression. This is limited to a single fragment execution. If the user wants to use another function, the query expression would be different and so another RemoteFunction object would be constructed. It will not be this same one.
bb18b7e
to
7cdcaae
Compare
Co-authored-by: Wills Feng <[email protected]>
velox_add_library(velox_functions_remote Remote.cpp) | ||
velox_link_libraries( | ||
velox_functions_remote | ||
PUBLIC velox_expression | ||
velox_memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a bit strange that these Velox exec dependencies came in because of REST. Can you remove them and check what happens ?
velox_functions_remote_get_serde | ||
velox_functions_remote_utils | ||
velox_type_fbhive | ||
velox_memory) | ||
|
||
add_executable(velox_functions_remote_server_main RemoteFunctionServiceMain.cpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an executable for RestRemoteFunctionService as well or enhance the code in this main for Rest service ?
|
||
// Because location_ is a variant, we must get the string: | ||
const auto& url = boost::get<std::string>(location_); | ||
const std::string fullUrl = fmt::format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... So then it means that the RemoteFunction isn't really tied to a particular function but can be used for any function ? Please document that. Then there isn't a need for functionName_ member variable.
@@ -62,7 +99,11 @@ class RemoteFunction : public exec::VectorFunction { | |||
exec::EvalCtx& context, | |||
VectorPtr& result) const override { | |||
try { | |||
applyRemote(rows, args, outputType, context, result); | |||
if ((metadata_.location.type() == typeid(SocketAddress))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't a need to do this typing at this point then. You can write a condition based on whether you have a thriftClient_ or restClient_.
// Because location_ is a variant, we must get the string: | ||
const auto& url = boost::get<std::string>(location_); | ||
const std::string fullUrl = fmt::format( | ||
"{}/v1/functions/{}/{}/{}/{}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This remote function is tied to a Presto rest function server. So this isn't really generic from a Velox perspective. Velox could be used with Gluten among other engines.
Can you make the fullUrl a configuration parameter of the Remote function ?
@@ -12,7 +12,8 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
|
|||
add_library(velox_functions_remote_utils RemoteFunctionServiceProvider.cpp) | |||
add_library(velox_functions_remote_utils RemoteFunctionHelper.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.h files are not compiled in add_library. Its better to split this into a .h and .cpp file.
@@ -0,0 +1,92 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a program with a main class that starts a Rest Function server (similar to that for Thrift in this directory) ?
/// @brief Listens for incoming TCP connections and creates sessions. | ||
/// Sets up a TCP acceptor to listen for client connections, | ||
/// creating a new session for each accepted connection. | ||
class listener : public std::enable_shared_from_this<listener> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found https://medium.com/@AlexanderObregon/building-restful-apis-with-c-4c8ac63fe8a7 that has similar code.
Can we provide that as an example of implementing a Rest server with boost::beast library ?
std::string returnType; | ||
}; | ||
|
||
std::map<std::string, InternalFunctionSignature> internalFunctionSignatureMap = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we should be doing such manual registration. The Thrift service is registering all PrestoSQL scalar functions. You can do the same for this Rest service for a start. But we should enhance this to use the dynamic function loading using #11439
const auto& functionSignature = | ||
internalFunctionSignatureMap.at(functionName); | ||
|
||
auto inputType = deserializeArgTypes(functionSignature.argumentTypes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Thrift and Rest servers would have such an implementation. Can you abstract a common function for this ?
{remotePrefix_ + ".remote_substr"}); | ||
} | ||
|
||
void initializeServer(uint16_t servicePort) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to abstract a RestFunctionServiceRunner class for this.
Fixes - #11036