Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(functions): Add support for REST based remote functions #10911

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Joe-Abraham
Copy link
Contributor

@Joe-Abraham Joe-Abraham commented Sep 2, 2024

Fixes - #11036

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 2, 2024
@Joe-Abraham Joe-Abraham changed the title Add support for REST based remote functions [WIP] Add support for REST based remote functions Sep 2, 2024
Copy link

netlify bot commented Sep 2, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 305b0a1
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67a44f92a4ff1b00086ffbea

@Joe-Abraham Joe-Abraham force-pushed the udf branch 12 times, most recently from b88d136 to b85e0e6 Compare September 4, 2024 09:32
@Yuhta Yuhta requested review from pedroerp and mbasmanova September 4, 2024 18:28
@Joe-Abraham Joe-Abraham force-pushed the udf branch 6 times, most recently from 0cd4510 to 74023dc Compare September 9, 2024 08:10
@pedroerp
Copy link
Contributor

pedroerp commented Sep 9, 2024

Pretty cool! I see the PR is still as draft, but I can help review when it's ready. Would also be nice to add some documentation on how to use it, the configs parameters, etc.

@Joe-Abraham Joe-Abraham force-pushed the udf branch 3 times, most recently from abe87e1 to 6c1606e Compare September 13, 2024 05:06
@Joe-Abraham Joe-Abraham force-pushed the udf branch 3 times, most recently from 05115f4 to 2ffec26 Compare September 20, 2024 05:18
Comment on lines +524 to +527
set(cpr_SOURCE BUNDLED)
velox_resolve_dependency(cpr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Joe-Abraham Joe-Abraham force-pushed the udf branch 4 times, most recently from f69837a to 11c7ad2 Compare January 29, 2025 05:58
@Joe-Abraham Joe-Abraham force-pushed the udf branch 4 times, most recently from 32b272b to 322d042 Compare January 30, 2025 07:35
@Joe-Abraham Joe-Abraham force-pushed the udf branch 3 times, most recently from a312e97 to c8e8424 Compare February 3, 2025 12:15
velox/functions/remote/client/Remote.cpp Outdated Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
velox/functions/remote/client/Remote.cpp Outdated Show resolved Hide resolved
velox/functions/remote/client/RestClient.h Outdated Show resolved Hide resolved
// Because location_ is a variant, we must get the string:
const auto& url = boost::get<std::string>(location_);
const std::string fullUrl = fmt::format(
"{}/v1/functions/{}/{}/{}/{}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This REST API makes an assumption about the endpoint. The backend side, is that specified somewhere and would we need to mention what this is as this must be specific to a service that is targeted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest server is following this presto rest function server, Can you help me guide where to document this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This remote function is tied to a Presto rest function server. So this isn't really generic from a Velox perspective. Velox could be used with Gluten among other engines.

Can you make the fullUrl a configuration parameter of the Remote function ?

#include <folly/init/Init.h>
#include <gmock/gmock.h>
#include <gtest/gtest.h>
#include <cstdio>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a system include it should come before boost. This seems to not have changed yet.

std::unique_ptr<std::thread> serverThread_;
boost::asio::io_context ioc_{1};

std::string location_{("http://127.0.0.1:8321")};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard coding the port we can get free ports for use in the test.
See function getFreePorts in the test utils.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, updated the code accordingly

velox_memory
velox_functions_prestosql)

add_executable(velox_functions_remote_server_rest_main
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this executable?
The test uses a separate thread to create the listener. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct there is not actual need for this executable unless someone want to run a local rest server with buildin velox function registered. I have removed the executable. I was just trying to keep the rest based remote functions as similar to the thrift implementation


DEFINE_string(service_host, "127.0.0.1", "Host to bind the service to");

DEFINE_int32(service_port, 8321, "Port to bind the service to");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need this we can pick a free port to use based on the port utils.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this code.

@@ -71,6 +112,50 @@ class RemoteFunction : public exec::VectorFunction {
}

private:
void applyRestRemote(
const SelectivityVector& rows,
std::vector<VectorPtr>& args,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const& for args.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, updated the code accordingly


// Because location_ is a variant, we must get the string:
const auto& url = boost::get<std::string>(location_);
const std::string fullUrl = fmt::format(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do fullUrl setup once during construction instead of for each apply ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the current design, it isn’t feasible to construct the fullUrl once in the constructor and reuse it. The fullUrl encapsulates not just the base path, but also function-specific details that can change over time. By hard-coding it in the constructor, we lose the ability to dynamically add or modify functions. Keeping the fullUrl generation flexible ensures it can adapt to new or updated functions without requiring changes to the constructor logic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... So then it means that the RemoteFunction isn't really tied to a particular function but can be used for any function ? Please document that. Then there isn't a need for functionName_ member variable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think the Remote function is designed this way. The Thrift part is assuming the functionName is fixed in member variable functionName_.

The RemoteFunction is a VectorFunction meaning that it is invoked in the context of an expression. This is limited to a single fragment execution. If the user wants to use another function, the query expression would be different and so another RemoteFunction object would be constructed. It will not be this same one.

velox_add_library(velox_functions_remote Remote.cpp)
velox_link_libraries(
velox_functions_remote
PUBLIC velox_expression
velox_memory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a bit strange that these Velox exec dependencies came in because of REST. Can you remove them and check what happens ?

velox_functions_remote_get_serde
velox_functions_remote_utils
velox_type_fbhive
velox_memory)

add_executable(velox_functions_remote_server_main RemoteFunctionServiceMain.cpp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an executable for RestRemoteFunctionService as well or enhance the code in this main for Rest service ?


// Because location_ is a variant, we must get the string:
const auto& url = boost::get<std::string>(location_);
const std::string fullUrl = fmt::format(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... So then it means that the RemoteFunction isn't really tied to a particular function but can be used for any function ? Please document that. Then there isn't a need for functionName_ member variable.

@@ -62,7 +99,11 @@ class RemoteFunction : public exec::VectorFunction {
exec::EvalCtx& context,
VectorPtr& result) const override {
try {
applyRemote(rows, args, outputType, context, result);
if ((metadata_.location.type() == typeid(SocketAddress))) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a need to do this typing at this point then. You can write a condition based on whether you have a thriftClient_ or restClient_.

// Because location_ is a variant, we must get the string:
const auto& url = boost::get<std::string>(location_);
const std::string fullUrl = fmt::format(
"{}/v1/functions/{}/{}/{}/{}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This remote function is tied to a Presto rest function server. So this isn't really generic from a Velox perspective. Velox could be used with Gluten among other engines.

Can you make the fullUrl a configuration parameter of the Remote function ?

@@ -12,7 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.

add_library(velox_functions_remote_utils RemoteFunctionServiceProvider.cpp)
add_library(velox_functions_remote_utils RemoteFunctionHelper.h
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.h files are not compiled in add_library. Its better to split this into a .h and .cpp file.

@@ -0,0 +1,92 @@
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a program with a main class that starts a Rest Function server (similar to that for Thrift in this directory) ?

/// @brief Listens for incoming TCP connections and creates sessions.
/// Sets up a TCP acceptor to listen for client connections,
/// creating a new session for each accepted connection.
class listener : public std::enable_shared_from_this<listener> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found https://medium.com/@AlexanderObregon/building-restful-apis-with-c-4c8ac63fe8a7 that has similar code.

Can we provide that as an example of implementing a Rest server with boost::beast library ?

std::string returnType;
};

std::map<std::string, InternalFunctionSignature> internalFunctionSignatureMap =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we should be doing such manual registration. The Thrift service is registering all PrestoSQL scalar functions. You can do the same for this Rest service for a start. But we should enhance this to use the dynamic function loading using #11439

const auto& functionSignature =
internalFunctionSignatureMap.at(functionName);

auto inputType = deserializeArgTypes(functionSignature.argumentTypes);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Thrift and Rest servers would have such an implementation. Can you abstract a common function for this ?

{remotePrefix_ + ".remote_substr"});
}

void initializeServer(uint16_t servicePort) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to abstract a RestFunctionServiceRunner class for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants