Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Dynamically Linked Library in CPP #11439

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

soumiiow
Copy link

@soumiiow soumiiow commented Nov 5, 2024

Related to prestodb/presto#23634 in the Prestissimo space
and based off of the following PR: https://github.com/facebookincubator/velox/pull/1005/files

These changes will allow users to dynamically load functions in prestissimo using cpp. The Presto Server will use this library to dynamically load User Defined Functions (UDFs), connectors, or types.

an example of dynamically registering a function is also provided for reference, along with a unit test

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 5, 2024
Copy link

netlify bot commented Nov 5, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 21dae50
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67a403bb4132430008afcd5c

@Yuhta Yuhta requested a review from pedroerp November 5, 2024 15:40
@pedroerp
Copy link
Contributor

pedroerp commented Nov 5, 2024

@soumiiow thanks for looking into this. Out of curiosity, why doesn't this work in MacOS?

@@ -15,6 +15,7 @@ add_subdirectory(base)
add_subdirectory(caching)
add_subdirectory(compression)
add_subdirectory(config)
add_subdirectory(dynamicRegistry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use snake case for directory names "dynamic_registry"

Copy link
Contributor

@pedroerp pedroerp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! I few small comments but overall looks good.

#include <dlfcn.h>
#include <iostream>
#include "velox/common/base/Exceptions.h"
namespace facebook::velox {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new line before namespace definition.

VELOX_USER_FAIL("Couldn't find Velox registry symbol: {}", error);
}
registryItem();
std::cout << "LOADED DYLLIB 1" << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency, could you use LOG(INFO) and print the file name / path of the library loaded?


static constexpr const char* kSymbolName = "registry";

void loadDynamicLibraryFunctions(const char* fileName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably omit the "Functions" from the name, and this can be used to really load anything, as long as you provide the registration functions. Let's name it loadDynamicLibrary()

### 1. Create a cpp file for your dynamic library
For dynamically loaded function registration, the format followed is mirrored of that of built-in function registration with some noted differences. Using [MyDynamicTestFunction.cpp](tests/MyDynamicTestFunction.cpp) as an example, the function uses the extern "C" keyword to protect against name mangling. A registry() function call is also necessary here.

### 2. Register functions dynamically by creating .dylib or .so shared libraries and dropping them in a plugin directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the titles are too long; maybe just add the docs as a refular numbered list?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed it out without the title formatting but does this look a bit cluttered now?

auto signaturesBefore = getFunctionSignatures().size();

// Function does not exist yet.
EXPECT_THROW(dynamicFunction(0), VeloxUserError);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use VELOX_ASSERT_THROW() instead to validate the right exception is being thrown?

# `MyDynamicFunction.cpp` as a small .so library, and use the
# MY_DYNAMIC_FUNCTION_LIBRARY_PATH macro to locate the .so binary.
add_compile_definitions(
MY_DYNAMIC_FUNCTION_LIBRARY_PATH="${CMAKE_CURRENT_BINARY_DIR}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please vendor the macro. Maybe something like VELOX_TEST_DYNAMIC_LIBRARY_PATH

* limitations under the License.
*/

#include "velox/common/dynamicRegistry/DynamicLibraryLoader.h"
Copy link
Collaborator

@majetideepak majetideepak Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this header include is not required.


// Dynamically load the library.
std::string libraryPath = MY_DYNAMIC_FUNCTION_LIBRARY_PATH;
libraryPath += "/libvelox_function_my_dynamic.so";
Copy link
Collaborator

@majetideepak majetideepak Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What else is an issue for MacOS?

Comment on lines 22 to 23
${GMock}
${GTEST_BOTH_LIBRARIES})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the GTest:: targets

# To test functions being added by dynamically linked libraries, we compile
# `MyDynamicFunction.cpp` as a small .so library, and use the
# VELOX_TEST_DYNAMIC_LIBRARY_PATH macro to locate the .so binary.
add_compile_definitions(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use target_compile_definitions( on the relevant target instead.

if(${VELOX_BUILD_TESTING})
add_subdirectory(tests)
endif()
velox_add_library(velox_dynamic_function_loader DynamicLibraryLoader.cpp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
velox_add_library(velox_dynamic_function_loader DynamicLibraryLoader.cpp)
velox_add_library(velox_dynamic_function_loader DynamicLibraryLoader.cpp)
velox_link_libraries(velox_dynamic_function_loader PRIVATE velox_exception)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding docs!

# `MyDynamicFunction.cpp` as a small .so library, and use the
# VELOX_TEST_DYNAMIC_LIBRARY_PATH macro to locate the .so binary.
add_compile_definitions(
VELOX_TEST_DYNAMIC_LIBRARY_PATH="${CMAKE_CURRENT_BINARY_DIR}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MacOS support is still missing. You can create the full library path here based on the CMake options I shared earlier.

velox/common/dynamic_registry/tests/MyDynamicFunction.cpp Outdated Show resolved Hide resolved
@@ -0,0 +1,22 @@
# Velox: Dynamically Loading Registry Libraries in C++
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Dynamic Loading of Velox Extensions" is probably a better title.

@@ -0,0 +1,22 @@
# Velox: Dynamically Loading Registry Libraries in C++

This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are to be loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prestissimo -> Velox. Remaining paragraph as well.

target_link_libraries(name_of_dynamic_fn PRIVATE xsimd fmt::fmt velox_expression)
```

3. In the Prestissimo worker's config.properties file, set the plugin.dir property
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to Prestissimo.

```
plugin.dir="User\Test\Path\plugin"
```
4. When the worker or the sidecar process starts, it will scan the plugin directory and attempt to dynamically load all shared libraries
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to Prestissimo.


namespace facebook::velox {

/// Dynamically opens and registers functions defined in a shared library (.so)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove (.so)
Add fullstop.


/// Dynamically opens and registers functions defined in a shared library (.so)
///
/// Given a shared library name (.so), this function will open it using dlopen,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opens a shared library using dlopen, looks for the symbol registry, and invokes it.

velox/common/dynamic_registry/DynamicLibraryLoader.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @soumiiow. Had bunch of minor comments, except for a bigger one around testing.

velox/common/dynamic_registry/CMakeLists.txt Show resolved Hide resolved

// Lookup the symbol.
void* registrySymbol = dlsym(handler, kSymbolName);
auto registryItem = reinterpret_cast<void (*)()>(registrySymbol);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : rename registryFunction

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey!! so i got some previous feedback to stay away from the "registryFunction" in the naming so as to not make it seem like this library is to be used exclusively for functions, and to move away from our initial design which was made with only the function loading in mind. Perhaps, would there be a better name for this variable than the work "item"? I can only rlly think of registryItem or registryPtr but would love to hear your suggestions too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumiiow : To me this is almost like the "main" function in a executable program. How about "loadLibrary" or "loadUserLibrary" or "enterUserLibrary" ? There could be code beyond registration here as well.

if (error != nullptr) {
VELOX_USER_FAIL("Couldn't find Velox registry symbol: {}", error);
}
registryItem();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment "Invoke the registry function"

velox/common/dynamic_registry/tests/DynamicLinkTest.cpp Outdated Show resolved Hide resolved
velox/common/dynamic_registry/tests/MyDynamicFunction.cpp Outdated Show resolved Hide resolved
target_link_libraries(name_of_dynamic_fn PRIVATE xsimd fmt::fmt velox_expression)
```

3. In the Prestissimo worker's config.properties file, set the plugin.dir property
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not relevant in Velox. And also since its not used anywhere in the current code, its hard to put this in picture.

@@ -0,0 +1,22 @@
# Velox: Dynamically Loading Registry Libraries in C++

This library adds the ability to load User Defined Functions (UDFs), connectors, or types without having to fork and build Prestissimo, through the use of shared libraries that a Prestissimo worker can access. These are to be loaded on launch of the Presto server. The Presto server searches for any .so or .dylib files and loads them using this library.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to not talk about Prestissimo in this README.

This is a generic utility for dynamically loading a "registry" function from a library. Its sufficient to just say that this is for "Extensibility" features that add custom user code which could include new Velox types, functions, operators and connectors.

@soumiiow soumiiow force-pushed the velox-dylib branch 2 times, most recently from 45cf7af to 4e82b71 Compare December 4, 2024 07:04
@soumiiow soumiiow force-pushed the velox-dylib branch 2 times, most recently from b6660ad to eb20996 Compare January 7, 2025 16:55
Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumiiow : Thanks for updating your tests. Have a bunch of questions.

///
/// void registry();
///
/// The registration function needs to be defined in the top-level namespace,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by top-level namespace ? Is that common terminology ?

velox/common/dynamic_registry/CMakeLists.txt Outdated Show resolved Hide resolved
velox/common/dynamic_registry/CMakeLists.txt Outdated Show resolved Hide resolved
dynamicFunction(0), "Scalar function doesn't exist: dynamic_2.");

// Dynamically load the library.
std::string libraryPath = VELOX_TEST_DYNAMIC_LIBRARY_PATH;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write in a single line

std::string libraryPath = VELOX_TEST_DYNAMIC_LIBRARY_PATH .append("/libvelox_function_same_twice_my_dynamic") .append(VELOX_TEST_DYNAMIC_LIBRARY_PATH_SUFFIX);

or use fmt::format function for the same.

const auto dynamicFunction = [&](std::optional<double> a) {
return evaluateOnce<int64_t>("dynamic_2()", a);
};
auto signaturesBefore = getFunctionSignatures().size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, what is the value of signaturesBefore ? I presume since you have derived from FunctionsTestBase this is non-zero ?

It might be good to have a CHECK where you are evaluating an expression that involves an existing function along with the new function you have registered.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumiiow : This comment is not addressed. Lets ensure that we can execute an expression that involves the function you have registered with something that was registered already.

dynamicFunction(0), "Scalar function doesn't exist: dynamic_1.");

// Dynamically load the library.
std::string libraryPath = VELOX_TEST_DYNAMIC_LIBRARY_PATH;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this pattern of appending VELOX_TEST_DYNAMIC_LIBRARY_PATH (and SUFFIX) is repeated, we could abstract a function for it.

"Expression evaluation result is not of expected type: dynamic_3() -> CONSTANT vector of type VARCHAR");

// Dynamically load the library.
std::string libraryPathInt = VELOX_TEST_DYNAMIC_LIBRARY_PATH;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its interesting that the first function got over-ridden.. Is that because the we used structs with the same name "Dynamic123Function" for the function implementations ? If you used a different name for the structures are the functions retained ? From a system perspective, we wouldn't want the overriding to happen.

@mohsaka
Copy link
Contributor

mohsaka commented Jan 23, 2025

@soumiiow Addressed all of the unaddressed comments and made all of the changes for compilation on mac in this PR
#12111

Specifically this commit,
ac18115

@soumiiow soumiiow changed the title Dynamically Linked Library in CPP feat: Dynamically Linked Library in CPP Jan 27, 2025
Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @soumiiow. Have another round of comments.


namespace facebook::velox::common {
template <template <class> class T, typename TReturn, typename... TArgs>
void registerFunctionWrapper(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably better to add this wrapper at the Prestissimo layer. We needn't have something in Velox.

const auto dynamicFunction = [&](std::optional<double> a) {
return evaluateOnce<int64_t>("dynamic_2()", a);
};
auto signaturesBefore = getFunctionSignatures().size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumiiow : This comment is not addressed. Lets ensure that we can execute an expression that involves the function you have registered with something that was registered already.

@@ -0,0 +1,45 @@
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not needed. You could use MyDynamicFunction twice for the same effect.

dynamicFunction(0), "Scalar function doesn't exist: dynamic_2.");

std::string libraryPath =
getLibraryPath("libvelox_function_same_twice_my_dynamic");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the same function as the previous test. A new one is not needed.

@mohsaka
Copy link
Contributor

mohsaka commented Jan 31, 2025

@aditi-pandit PR Opened here to address the comments.
Removal of duplicate dynamic library, Removal of the wrapper, rename of the dynamic libraries to represent what they are testing.

soumiiow#1

Should be merged into this branch soon.

else()
set(CMAKE_DYLIB_TEST_LINK_LIBRARIES fmt::fmt xsimd)

target_link_libraries(velox_function_my_dynamic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target_link_libraries is the same for APPLE and Linux? Can we make this common?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see because of fmt wasn't needed on mac due to it being part of folly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumiiow Can you check if we can use folly:folly instead of fmt::fmt on mac so we can standardize it?

Copy link
Author

@soumiiow soumiiow Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@majetideepak looks like adding folly to linux gets us this error:

ERROR: something wrong with flag 'folly_hazptr_use_executor' in file '/root/dylib/presto/presto-native-execution/scripts/deps-download/folly/folly/synchronization/Hazptr.cpp'.  One possibility: file '/root/dylib/presto/presto-native-execution/scripts/deps-download/folly/folly/synchronization/Hazptr.cpp' is being linked both statically and dynamically into this executable.

Copy link
Collaborator

@majetideepak majetideepak Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include only fmt in both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that leads to

[build] /Users/michaelohsaka/michael/velox/./velox/common/base/VeloxException.h:26:10: fatal error: 'glog/logging.h' file not found
[build]    26 | #include <glog/logging.h>
[build]       |          ^~~~~~~~~~~~~~~~
[build] 1 error generated.

I can include glog::glog though and that works fine. Maybe we can try that on linux? @soumiiow

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include glog::glog. Folly is known to cause this is being linked both statically and dynamically into this executable when built dynamically because of gflags.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@majetideepak @aditi-pandit adding glog::glog works. pushed up the latest changes without the if(apple) bit. please take a look

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glog header can be removed. I opened a PR here
#12248

"-Wl,-undefined,dynamic_lookup")

else()
set(CMAKE_DYLIB_TEST_LINK_LIBRARIES fmt::fmt xsimd)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this different for APPLE?

velox/common/dynamic_registry/README.md Outdated Show resolved Hide resolved
@majetideepak
Copy link
Collaborator

majetideepak commented Feb 3, 2025

@soumiiow, @mohsaka Let's unify APPLE and LINUX as much as possible to simplify the user experience.

@aditi-pandit
Copy link
Collaborator

@soumiiow, @mohsaka Let's unify APPLE and LINUX as much as possible to simplify the user experience.

Second that. Lets have any dependencies at a minimal.

@soumiiow soumiiow force-pushed the velox-dylib branch 2 times, most recently from ff3c929 to a9b49ab Compare February 3, 2025 17:34
getLibraryPath("libvelox_overload_int_function_my_dynamic");
loadDynamicLibrary(libraryPathInt.data());

// The first function loaded should NOT be rewritten.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: overwritten instead of overwritten and we end up with two functions with name dynamic_overload having varchar and bigint respectively?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. The function should be overloaded.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the comment @czentgr!

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the last round from me. Thanks @soumiiow and @mohsaka

@@ -0,0 +1,25 @@
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this is added here instead of velox/common folder ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is because Udf.h is there. I can move it though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to common/dynamic_registry


#include "velox/functions/Udf.h"

extern "C" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a header comment to this file explaining why it is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


std::string libraryPath = getLibraryPath("libvelox_function_my_dynamic");

loadDynamicLibrary(libraryPath.data());
Copy link
Collaborator

@aditi-pandit aditi-pandit Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This presumes that the first test has executed before this one. gtest doesn't gives this guarantee. Each test should be self-contained. We can load the same library twice within the test itself for the effect.

Copy link
Contributor

@mohsaka mohsaka Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by loading once before with no checks except for the function existing.

# See the License for the specific language governing permissions and
# limitations under the License.

velox_add_library(velox_dynamic_function_loader DynamicLibraryLoader.cpp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we name the library similar to naming everywhere?
velox_dynamic_function_loader -> velox_dynamic_library_loader

/// locating the function it executes the registration bringing the UDFs in the
/// scope of the Velox runtime.
///
/// If the same library is loaded twice then a no-op scenerio will happen.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this is a no-op? Isn't registering a Velox component twice an error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added explanation. Velox functions overwrite by default. So it will overwrite its function with itself.

Copy link
Collaborator

@majetideepak majetideepak Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is not true for all Velox modules. Connectors will fail for example. Let's say something like
Loading a library twice can cause a module to be registered twice. This can fail for certain Velox modules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and addressed all comments. Waiting for @soumiiow to merge in the branch.

https://github.com/soumiiow/velox/pull/4/files

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Let's use component instead of module since we used component on top.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged the branch and pushed the component vs module change out


/// The function uses dlopen to load the shared library.
/// It then searches for the "void registry()" C function which typically
/// contains all the registration code for the UDFs defined in library. After
Copy link
Collaborator

@majetideepak majetideepak Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe replace UDFs -> user-defined Velox components such as functions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighting this because hasn't been changed yet.


extern "C" {
void registry();
void check() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's comment here that The "registry()" declaration and "check()" function ensure the correct registry signature function is defined by the user.

add_library(name_of_dynamic_fn SHARED TestFunction.cpp)
target_link_libraries(name_of_dynamic_fn PRIVATE xsimd fmt::fmt)
```
Above, the xsimd and fmt::fmt libraries are required for all necessary symbols to be defined when loading the TestFunction.cpp dynamically
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with only one example, and a short reasoning on what can be skipped on linux.

facebook-github-bot pushed a commit that referenced this pull request Feb 4, 2025
Summary:
This is causing the xsimd header leak.
See #11439 (comment)

Pull Request resolved: #12231

Reviewed By: bikramSingh91

Differential Revision: D68982274

Pulled By: kgpai

fbshipit-source-id: 55624bc0161460fc8eff317329af6f009aeb0f94
@soumiiow soumiiow force-pushed the velox-dylib branch 2 times, most recently from bda5649 to 020aeda Compare February 5, 2025 16:14
PRIVATE ${CMAKE_DYLIB_TEST_LINK_LIBRARIES})

target_link_options(velox_function_my_dynamic PRIVATE
"-Wl,-undefined,dynamic_lookup")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this could be a variable like

set(COMMON_LIBRARY_LINK_OPTIONS "-Wl,-undefined,dynamic_lookup")

and then use

target_link_options(velox_function_my_dynamic PRIVATE
                    ${COMMON_LIBRARY_LINK_OPTIONS})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

TEST_F(DynamicLinkTest, dynamicLoadOverwriteFunc) {
const auto dynamicIntFunction = [&](std::optional<double> a) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be have no argument at all.
evaluateOnce should allow this

return evaluateOnce<int64_t>("dynamic_overwrite()");

The overwrite only occurs because the signature (aka no input args) are the same for both functions. The difference is the return type. Am I correct?
Same for the varchar return type function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the arguments, but had to put something in for evaluateOnce for the row or else it will be a nullptr result. Followed example from pi() and e().

Changed to

  const auto dynamicIntFunction = [&]() {
    return evaluateOnce<int64_t>("dynamic_overwrite()", makeRowVector(ROW({}), 1));
  };

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the only difference is the return type. The rest of the signature is the same.

@@ -0,0 +1,26 @@
# Dynamic Loading of Velox Extensions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add a note to the overwrite vs overload topic at the bottom to explain to users when either case occurs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mohsaka and @soumilw

///
/// void registry();
///
/// The registration function needs to be defined in the root-level namespace,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is root-level namespace the global namespace ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be the same thing. Root, ex the base level. I can rename it to global as well.


class DynamicLinkTest : public FunctionBaseTest {};

std::string getLibraryPath(const std::string& filename) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap this in an anonymous namespace.

}

TEST_F(DynamicLinkTest, dynamicLoadSameFuncTwice) {
const auto dynamicFunction = [&](std::optional<double> a) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like @czentgr's comment is the argument for "a" needed here as well ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the other ones as well.


/// The function uses dlopen to load the shared library.
/// It then searches for the "void registry()" C function which typically
/// contains all the registration code for the UDFs defined in library. After
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighting this because hasn't been changed yet.

/// Loading a library twice can cause a components to be registered twice.
/// This can fail for certain Velox components.

void loadDynamicLibrary(const char* fileName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using char* here instead of const std::string&? Is it because we don't want to include ?

This function is only used in velox itself so this wouldn't be a problem if we are concerned about externalizing it.

dynamicFunction(0), "Scalar function doesn't exist: dynamic.");

std::string libraryPath = getLibraryPath("libvelox_function_my_dynamic");
loadDynamicLibrary(libraryPath.data());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use a const char* as input then we need to use libraryPath().c_str() and not data(). There is no guarantee data is null terminated.

This occurs multiple times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to str

namespace facebook::velox::common::dynamicRegistry {

template <typename T>
struct Dynamic123Function {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohsaka, @soumiiow : Nit : Its better to avoid using notation like 123 in the function name. Use something like DynamicTestFunction or just DynamicFunction for the struct names everywhere.

Also remove the usage of "My" in the filenames... They can be DynamicTestFunction or just DynamicFunction to be consistent with the struct name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants