-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add spatial support #12053
base: main
Are you sure you want to change the base?
feat: Add spatial support #12053
Conversation
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First initial thoughts.
include_guard(GLOBAL) | ||
|
||
# GEOS Configuration | ||
set(VELOX_GEOS_BUILD_VERSION 3.11.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this as a non-bundled dependency as well to the setup scripts?
Also please update the README in this folder.
|
||
void registerGeometryType(); | ||
|
||
} // namespace facebook::velox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a new line.
namespace facebook::velox { | ||
|
||
/// Represents Geometry as a string. | ||
class GeometryType : public VarcharType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we use varchar here because we serialize the GEOS object into a string via the StringWriter, right?
This would then be the current internal representation and handled like a regular varchar in exchanges and such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, I can update this to VarbinaryType
because both types are serialized into variable-width blocks, and in Velox, we can use StringWriter
for handling both.
Sorry for the delay, @wraymo! I'll get to this by the end of the week. Thanks for your patience. |
Thanks @wraymo for this work! Since this is just a draft, I'll keep most of my comments high-level.
I want to emphasize this is a great draft, and you don't have to do everything yourself! I'm happy to take on this work if you don't have the time for any or all of it, but if you wanted to start finalizing parts of the PR we can start shipping initial bits. |
Thanks @jagill for the thoughtful feedback! That all makes sense. I'll use Although this is a draft PR, we've deployed it in one of our clusters alongside a basic local spatial join (using a nested loop) and tested it on several queries. So far, the results match those from the Java workers, which is promising. Regarding splitting this PR, my plan aligns closely with this issue—seems like we're on the same page! For the first PR, do you prefer introducing the Geometry type (modeled like HyperLogLogType and adding a test in For code organization, would it make sense to place serialization and related utility functions in I have plenty of time to work on this and am happy to contribute however I can. Let me know how you'd like to proceed, and I'll push updates accordingly. Thanks again for your guidance and for offering to help! |
This is great to hear!
For steps:
For what's needed for the GEOS dependency, let's make sure:
Yes, this sounds good!
Great to have you contributing! Let's start with the Geos import. We'll have some people more familiar with Velox build give feedback. |
@jagill Thanks for your comment! I just submitted a PR to add GEOS as an optional dependency. My current PR serializes data in Shapefile format. Should we use Shapefile first, like the Java worker, to avoid compatibility issues (e.g. if the coordinator sends Shapefile-serialized data to a Velox worker)? |
Summary:
This draft PR introduces spatial data support (#11814) to Velox, including the following key updates:
VARCHAR
(WKT) orVARBINARY
(WKB) and geometry types.ST_Point
andST_Contains
.