-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17966: [C++] Adjust to new format for Substrait optional arguments #14415
ARROW-17966: [C++] Adjust to new format for Substrait optional arguments #14415
Conversation
This PR currently depends on a non-official (PR fork) version of Substrait. We probably should not merge until the corresponding Substrait PR has merged. |
|
std::optional<std::vector<std::string> const*> SubstraitCall::GetOption( | ||
std::string_view option_name) const { | ||
auto opt = options_.find(std::string(option_name)); | ||
if (opt == options_.end()) { | ||
return std::nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this could be simpler
std::optional<std::vector<std::string> const*> SubstraitCall::GetOption( | |
std::string_view option_name) const { | |
auto opt = options_.find(std::string(option_name)); | |
if (opt == options_.end()) { | |
return std::nullopt; | |
const std::vector<std::string>* SubstraitCall::GetOption( | |
std::string_view option_name) const { | |
auto opt = options_.find(std::string(option_name)); | |
if (opt == options_.end()) { | |
return nullptr; |
@@ -132,10 +133,29 @@ Result<ExtensionSet> GetExtensionSetFromPlan(const substrait::Plan& plan, | |||
conversion_options, registry); | |||
} | |||
|
|||
namespace { | |||
|
|||
// FIXME Is there some way to get these from the cmake files? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There does not appear to be a way to inspect a substrait clone and get the version numbers apart from parsing CHANGELOG.md. You could add #cmakedefine ARROW_SUBSTRAIT_BUILD_VERSION
to config.h.cmake then parse that here, or parse it in cmake and add #cmakedefine ARROW_SUBSTRAIT_BUILD_VERSION_MAJOR
etc to config.h.cmake. This is for example how ThirdpartyToolchain.cmake provides a #define
to indicate that jemalloc is vendored https://github.com/drin/arrow/blob/dea465396f981f9bd9862e501dc6750ca4fee6d9/cpp/cmake_modules/ThirdpartyToolchain.cmake#L1864-L1865 -> https://github.com/drin/arrow/blob/dea465396f981f9bd9862e501dc6750ca4fee6d9/cpp/src/arrow/util/config.h.cmake#L48
@@ -645,22 +663,19 @@ struct ExtensionIdRegistryImpl : ExtensionIdRegistry { | |||
}; | |||
|
|||
template <typename Enum> | |||
using EnumParser = std::function<Result<Enum>(std::optional<std::string_view>)>; | |||
using EnumParser = std::function<Result<Enum>(std::string_view)>; | |||
|
|||
template <typename Enum> | |||
EnumParser<Enum> GetEnumParser(const std::vector<std::string>& options) { | |||
std::unordered_map<std::string, Enum> parse_map; | |||
for (std::size_t i = 0; i < options.size(); i++) { | |||
parse_map[options[i]] = static_cast<Enum>(i + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably not necessary to reify "UNSPECIFIED" in all the enumerations. We will always either coerce to a default or error because a real value was required. That could be expressed in EnumParser as:
template <typename Enum>
using EnumParser = std::function<Result<Enum>(std::string_view repr, std::optional<Enum> default)>;
... and it would save you some one-based indexing headache.
Probably out of scope for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed to leave out of scope for this PR. I also agree it would probably be a good idea. The "unspecified" thing is a relic from the old behavior where "unspecified" was something you had to explicitly specify.
@bkietz Thanks for the review. I still need to take a look at the cmake changes but I'll try that out later. |
26fd452
to
f03fcd9
Compare
Deferring the cmake changes to https://issues.apache.org/jira/browse/ARROW-18145 |
@bkietz I'm moving this to ready-for-review. @rtpsw has been trying to get Ibis / Arrow compatibility working and this PR enables that, even though it temporarily breaks us from the main branch of Substrait while we wait on substrait-io/substrait#342 to merge. However, there is some work that is stacking up on top of this and branch management has started to cause some headaches. Curious to get your thoughts on moving forward as-is with this PR, to make it easier for people working on integration, and addressing any remaining discrepancies (e.g. Enum vs string) in follow-up PRs. |
@@ -657,6 +657,13 @@ else() | |||
"${THIRDPARTY_MIRROR_URL}/snappy-${ARROW_SNAPPY_BUILD_VERSION}.tar.gz") | |||
endif() | |||
|
|||
# Remove these two lines once https://github.com/substrait-io/substrait/pull/342 merges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving forward sounds fine to me; this is why we're pulling this dependency from a URL after all. I've added https://issues.apache.org/jira/browse/ARROW-18201 to track the reversion. In the meantime, please move this modification to versions.txt, annotated with // TODO(ARROW-18201)
0b929f2
to
3858293
Compare
The feature has now merged into Substrait. Unfortunately, it seems releases are automated on Sundays so this probably can't merge until after Sunday at which time I will change the version numbers to be 0.20.0. |
a94979a
to
e158f2d
Compare
Can I help moving this along somehow? If not please ignore. |
…nal enum args to proper options. Added check for minimum Substrait version
…ndling to check major version and not just minor
Co-authored-by: Benjamin Kietzman <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
… substrait option that Acero doesn't support
…nces for an option
e158f2d
to
be691d4
Compare
The mingw failures appear to be unrelated |
Benchmark runs are scheduled for baseline = 7276c35 and contender = 63f013c. 63f013c is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
No description provided.