Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17966: [C++] Adjust to new format for Substrait optional arguments #14415

Conversation

westonpace
Copy link
Member

No description provided.

@westonpace
Copy link
Member Author

westonpace commented Oct 14, 2022

This PR currently depends on a non-official (PR fork) version of Substrait. We probably should not merge until the corresponding Substrait PR has merged.

substrait-io/substrait#342

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

Comment on lines +168 to +173
std::optional<std::vector<std::string> const*> SubstraitCall::GetOption(
std::string_view option_name) const {
auto opt = options_.find(std::string(option_name));
if (opt == options_.end()) {
return std::nullopt;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this could be simpler

Suggested change
std::optional<std::vector<std::string> const*> SubstraitCall::GetOption(
std::string_view option_name) const {
auto opt = options_.find(std::string(option_name));
if (opt == options_.end()) {
return std::nullopt;
const std::vector<std::string>* SubstraitCall::GetOption(
std::string_view option_name) const {
auto opt = options_.find(std::string(option_name));
if (opt == options_.end()) {
return nullptr;

cpp/src/arrow/engine/substrait/extension_set.cc Outdated Show resolved Hide resolved
cpp/src/arrow/engine/substrait/extension_set.cc Outdated Show resolved Hide resolved
cpp/src/arrow/engine/substrait/extension_set.cc Outdated Show resolved Hide resolved
cpp/src/arrow/engine/substrait/extension_set.cc Outdated Show resolved Hide resolved
cpp/src/arrow/engine/substrait/extension_set.cc Outdated Show resolved Hide resolved
cpp/src/arrow/engine/substrait/extension_set.cc Outdated Show resolved Hide resolved
cpp/src/arrow/engine/substrait/expression_internal.cc Outdated Show resolved Hide resolved
@@ -132,10 +133,29 @@ Result<ExtensionSet> GetExtensionSetFromPlan(const substrait::Plan& plan,
conversion_options, registry);
}

namespace {

// FIXME Is there some way to get these from the cmake files?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There does not appear to be a way to inspect a substrait clone and get the version numbers apart from parsing CHANGELOG.md. You could add #cmakedefine ARROW_SUBSTRAIT_BUILD_VERSION to config.h.cmake then parse that here, or parse it in cmake and add #cmakedefine ARROW_SUBSTRAIT_BUILD_VERSION_MAJOR etc to config.h.cmake. This is for example how ThirdpartyToolchain.cmake provides a #define to indicate that jemalloc is vendored https://github.com/drin/arrow/blob/dea465396f981f9bd9862e501dc6750ca4fee6d9/cpp/cmake_modules/ThirdpartyToolchain.cmake#L1864-L1865 -> https://github.com/drin/arrow/blob/dea465396f981f9bd9862e501dc6750ca4fee6d9/cpp/src/arrow/util/config.h.cmake#L48

@@ -645,22 +663,19 @@ struct ExtensionIdRegistryImpl : ExtensionIdRegistry {
};

template <typename Enum>
using EnumParser = std::function<Result<Enum>(std::optional<std::string_view>)>;
using EnumParser = std::function<Result<Enum>(std::string_view)>;

template <typename Enum>
EnumParser<Enum> GetEnumParser(const std::vector<std::string>& options) {
std::unordered_map<std::string, Enum> parse_map;
for (std::size_t i = 0; i < options.size(); i++) {
parse_map[options[i]] = static_cast<Enum>(i + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not necessary to reify "UNSPECIFIED" in all the enumerations. We will always either coerce to a default or error because a real value was required. That could be expressed in EnumParser as:

template <typename Enum>
using EnumParser = std::function<Result<Enum>(std::string_view repr, std::optional<Enum> default)>;

... and it would save you some one-based indexing headache.

Probably out of scope for this PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed to leave out of scope for this PR. I also agree it would probably be a good idea. The "unspecified" thing is a relic from the old behavior where "unspecified" was something you had to explicitly specify.

@westonpace
Copy link
Member Author

@bkietz Thanks for the review. I still need to take a look at the cmake changes but I'll try that out later.

@westonpace westonpace force-pushed the feature/ARROW-17966--dont-require-optional-args branch from 26fd452 to f03fcd9 Compare October 20, 2022 00:57
@westonpace
Copy link
Member Author

Deferring the cmake changes to https://issues.apache.org/jira/browse/ARROW-18145

@westonpace westonpace marked this pull request as ready for review October 24, 2022 16:54
@westonpace
Copy link
Member Author

@bkietz I'm moving this to ready-for-review. @rtpsw has been trying to get Ibis / Arrow compatibility working and this PR enables that, even though it temporarily breaks us from the main branch of Substrait while we wait on substrait-io/substrait#342 to merge. However, there is some work that is stacking up on top of this and branch management has started to cause some headaches.

Curious to get your thoughts on moving forward as-is with this PR, to make it easier for people working on integration, and addressing any remaining discrepancies (e.g. Enum vs string) in follow-up PRs.

@@ -657,6 +657,13 @@ else()
"${THIRDPARTY_MIRROR_URL}/snappy-${ARROW_SNAPPY_BUILD_VERSION}.tar.gz")
endif()

# Remove these two lines once https://github.com/substrait-io/substrait/pull/342 merges
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving forward sounds fine to me; this is why we're pulling this dependency from a URL after all. I've added https://issues.apache.org/jira/browse/ARROW-18201 to track the reversion. In the meantime, please move this modification to versions.txt, annotated with // TODO(ARROW-18201)

@westonpace westonpace force-pushed the feature/ARROW-17966--dont-require-optional-args branch from 0b929f2 to 3858293 Compare November 14, 2022 19:25
@westonpace
Copy link
Member Author

The feature has now merged into Substrait. Unfortunately, it seems releases are automated on Sundays so this probably can't merge until after Sunday at which time I will change the version numbers to be 0.20.0.

@westonpace westonpace force-pushed the feature/ARROW-17966--dont-require-optional-args branch from a94979a to e158f2d Compare November 22, 2022 03:27
@rok
Copy link
Member

rok commented Nov 22, 2022

Can I help moving this along somehow? If not please ignore.

@westonpace westonpace force-pushed the feature/ARROW-17966--dont-require-optional-args branch from e158f2d to be691d4 Compare November 22, 2022 17:52
@westonpace
Copy link
Member Author

@rok @bkietz Now that substrait version 0.20.0 is out, with this change, I think this is good to go.

@westonpace westonpace requested a review from bkietz November 22, 2022 17:53
@westonpace
Copy link
Member Author

The mingw failures appear to be unrelated

@westonpace westonpace merged commit 63f013c into apache:master Nov 25, 2022
@ursabot
Copy link

ursabot commented Nov 26, 2022

Benchmark runs are scheduled for baseline = 7276c35 and contender = 63f013c. 63f013c is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.2% ⬆️0.07%] test-mac-arm
[Finished ⬇️0.54% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.38% ⬆️0.03%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 63f013cd ec2-t3-xlarge-us-east-2
[Finished] 63f013cd test-mac-arm
[Finished] 63f013cd ursa-i9-9960x
[Finished] 63f013cd ursa-thinkcentre-m75q
[Finished] 7276c359 ec2-t3-xlarge-us-east-2
[Finished] 7276c359 test-mac-arm
[Finished] 7276c359 ursa-i9-9960x
[Finished] 7276c359 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants