Skip to content

Commit

Permalink
GH-39357: [C++] Reduce function.h includes (#39312)
Browse files Browse the repository at this point in the history
### Rationale for this change

As proposed in #36246 , by splitting function option structs from `function.h`, we can reduce the including of `function.h`. So that the total build time could be reduced.

The total parser time could be reduced from 722.3s to 709.7s. And the `function.h` along with its transitive inclusion of `kernel.h` don't show up in expensive headers any more.

The detailed analysis result before and after this PR are attached: 
[analyze-before.txt](https://github.com/apache/arrow/files/13756923/analyze-before.txt)
[analyze-after.txt](https://github.com/apache/arrow/files/13756924/analyze-after.txt)

Disclaimer (quote from #36246 (comment)):
> Note that the time diff is not absolute. The ClangBuildAnalyzer result differs from time to time. I guess it depends on the idle-ness of the building machine when doing the experiment. But the time reduction is almost certain, though sometimes more sometimes less. And the inclusion times of the questioning headers are reduced for sure, as shown in the attachments in my other comment.

### What changes are included in this PR?

Move function option structs into own `compute/options.h`, and change including `function.h` to including `options.h` wherever fits.

### Are these changes tested?

Build is testing.

### Are there any user-facing changes?

There could be potential build failures for user code (quote from #36246 (comment)):
> The header function.h remains in compute/api.h, with and without this PR. The proposed PR removes function.h from api_xxx.h (then includes options.h instead), as proposed in the initial description of this issue. This results in compile failures for user code which includes only compute/api_xxx.h but not compute/api.h, and meanwhile uses CallFunction which is declared in function.h.

But I think it's OK as described in #36246 (comment).

* Closes: #39357

Authored-by: zanmato <[email protected]>
Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
  • Loading branch information
zanmato1984 authored Dec 26, 2023
1 parent b32f71a commit cf44793
Show file tree
Hide file tree
Showing 19 changed files with 111 additions and 58 deletions.
2 changes: 1 addition & 1 deletion cpp/examples/arrow/compute_and_write_csv_example.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
// under the License.

#include <arrow/api.h>
#include <arrow/compute/api_aggregate.h>
#include <arrow/compute/api.h>
#include <arrow/csv/api.h>
#include <arrow/csv/writer.h>
#include <arrow/io/api.h>
Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/acero/aggregate_internal.cc
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include "arrow/acero/exec_plan.h"
#include "arrow/acero/options.h"
#include "arrow/compute/exec.h"
#include "arrow/compute/function.h"
#include "arrow/compute/registry.h"
#include "arrow/compute/row/grouper.h"
#include "arrow/datum.h"
Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/acero/scalar_aggregate_node.cc
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include "arrow/acero/options.h"
#include "arrow/acero/util.h"
#include "arrow/compute/exec.h"
#include "arrow/compute/function.h"
#include "arrow/compute/registry.h"
#include "arrow/compute/row/grouper.h"
#include "arrow/datum.h"
Expand Down
21 changes: 13 additions & 8 deletions cpp/src/arrow/compute/api.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,18 +20,23 @@

#pragma once

/// \defgroup compute-functions Abstract compute function API
/// @{
/// @}

/// \defgroup compute-concrete-options Concrete option classes for compute functions
/// @{
/// @}

#include "arrow/compute/api_aggregate.h" // IWYU pragma: export
#include "arrow/compute/api_scalar.h" // IWYU pragma: export
#include "arrow/compute/api_vector.h" // IWYU pragma: export
#include "arrow/compute/cast.h" // IWYU pragma: export
#include "arrow/compute/function.h" // IWYU pragma: export
#include "arrow/compute/kernel.h" // IWYU pragma: export
#include "arrow/compute/registry.h" // IWYU pragma: export
#include "arrow/datum.h" // IWYU pragma: export
#include "arrow/compute/api_aggregate.h" // IWYU pragma: export
#include "arrow/compute/api_scalar.h" // IWYU pragma: export
#include "arrow/compute/api_vector.h" // IWYU pragma: export
#include "arrow/compute/cast.h" // IWYU pragma: export
#include "arrow/compute/function.h" // IWYU pragma: export
#include "arrow/compute/function_options.h" // IWYU pragma: export
#include "arrow/compute/kernel.h" // IWYU pragma: export
#include "arrow/compute/registry.h" // IWYU pragma: export
#include "arrow/datum.h" // IWYU pragma: export

#include "arrow/compute/expression.h" // IWYU pragma: export

Expand Down
2 changes: 1 addition & 1 deletion cpp/src/arrow/compute/api_aggregate.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

#include <vector>

#include "arrow/compute/function.h"
#include "arrow/compute/function_options.h"
#include "arrow/datum.h"
#include "arrow/result.h"
#include "arrow/util/macros.h"
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/arrow/compute/api_scalar.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
#include <string>
#include <utility>

#include "arrow/compute/function.h"
#include "arrow/compute/function_options.h"
#include "arrow/compute/type_fwd.h"
#include "arrow/datum.h"
#include "arrow/result.h"
Expand Down
3 changes: 1 addition & 2 deletions cpp/src/arrow/compute/api_vector.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,8 @@
#include <memory>
#include <utility>

#include "arrow/compute/function.h"
#include "arrow/compute/function_options.h"
#include "arrow/compute/ordering.h"
#include "arrow/datum.h"
#include "arrow/result.h"
#include "arrow/type_fwd.h"

Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/compute/cast.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
#include <vector>

#include "arrow/compute/function.h"
#include "arrow/compute/function_options.h"
#include "arrow/compute/type_fwd.h"
#include "arrow/result.h"
#include "arrow/status.h"
Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/compute/function.cc
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include "arrow/compute/exec.h"
#include "arrow/compute/exec_internal.h"
#include "arrow/compute/function_internal.h"
#include "arrow/compute/function_options.h"
#include "arrow/compute/kernels/common_internal.h"
#include "arrow/compute/registry.h"
#include "arrow/datum.h"
Expand Down
46 changes: 1 addition & 45 deletions cpp/src/arrow/compute/function.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,53 +36,9 @@
namespace arrow {
namespace compute {

/// \defgroup compute-functions Abstract compute function API
///
/// \addtogroup compute-functions
/// @{

/// \brief Extension point for defining options outside libarrow (but
/// still within this project).
class ARROW_EXPORT FunctionOptionsType {
public:
virtual ~FunctionOptionsType() = default;

virtual const char* type_name() const = 0;
virtual std::string Stringify(const FunctionOptions&) const = 0;
virtual bool Compare(const FunctionOptions&, const FunctionOptions&) const = 0;
virtual Result<std::shared_ptr<Buffer>> Serialize(const FunctionOptions&) const;
virtual Result<std::unique_ptr<FunctionOptions>> Deserialize(
const Buffer& buffer) const;
virtual std::unique_ptr<FunctionOptions> Copy(const FunctionOptions&) const = 0;
};

/// \brief Base class for specifying options configuring a function's behavior,
/// such as error handling.
class ARROW_EXPORT FunctionOptions : public util::EqualityComparable<FunctionOptions> {
public:
virtual ~FunctionOptions() = default;

const FunctionOptionsType* options_type() const { return options_type_; }
const char* type_name() const { return options_type()->type_name(); }

bool Equals(const FunctionOptions& other) const;
std::string ToString() const;
std::unique_ptr<FunctionOptions> Copy() const;
/// \brief Serialize an options struct to a buffer.
Result<std::shared_ptr<Buffer>> Serialize() const;
/// \brief Deserialize an options struct from a buffer.
/// Note: this will only look for `type_name` in the default FunctionRegistry;
/// to use a custom FunctionRegistry, look up the FunctionOptionsType, then
/// call FunctionOptionsType::Deserialize().
static Result<std::unique_ptr<FunctionOptions>> Deserialize(
const std::string& type_name, const Buffer& buffer);

protected:
explicit FunctionOptions(const FunctionOptionsType* type) : options_type_(type) {}
const FunctionOptionsType* options_type_;
};

ARROW_EXPORT void PrintTo(const FunctionOptions&, std::ostream*);

/// \brief Contains the number of required arguments for the function.
///
/// Naming conventions taken from https://en.wikipedia.org/wiki/Arity.
Expand Down
81 changes: 81 additions & 0 deletions cpp/src/arrow/compute/function_options.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

// NOTE: API is EXPERIMENTAL and will change without going through a
// deprecation cycle.

#pragma once

#include "arrow/compute/type_fwd.h"
#include "arrow/result.h"
#include "arrow/status.h"
#include "arrow/type_fwd.h"
#include "arrow/util/visibility.h"

namespace arrow {
namespace compute {

/// \addtogroup compute-functions
/// @{

/// \brief Extension point for defining options outside libarrow (but
/// still within this project).
class ARROW_EXPORT FunctionOptionsType {
public:
virtual ~FunctionOptionsType() = default;

virtual const char* type_name() const = 0;
virtual std::string Stringify(const FunctionOptions&) const = 0;
virtual bool Compare(const FunctionOptions&, const FunctionOptions&) const = 0;
virtual Result<std::shared_ptr<Buffer>> Serialize(const FunctionOptions&) const;
virtual Result<std::unique_ptr<FunctionOptions>> Deserialize(
const Buffer& buffer) const;
virtual std::unique_ptr<FunctionOptions> Copy(const FunctionOptions&) const = 0;
};

/// \brief Base class for specifying options configuring a function's behavior,
/// such as error handling.
class ARROW_EXPORT FunctionOptions : public util::EqualityComparable<FunctionOptions> {
public:
virtual ~FunctionOptions() = default;

const FunctionOptionsType* options_type() const { return options_type_; }
const char* type_name() const { return options_type()->type_name(); }

bool Equals(const FunctionOptions& other) const;
std::string ToString() const;
std::unique_ptr<FunctionOptions> Copy() const;
/// \brief Serialize an options struct to a buffer.
Result<std::shared_ptr<Buffer>> Serialize() const;
/// \brief Deserialize an options struct from a buffer.
/// Note: this will only look for `type_name` in the default FunctionRegistry;
/// to use a custom FunctionRegistry, look up the FunctionOptionsType, then
/// call FunctionOptionsType::Deserialize().
static Result<std::unique_ptr<FunctionOptions>> Deserialize(
const std::string& type_name, const Buffer& buffer);

protected:
explicit FunctionOptions(const FunctionOptionsType* type) : options_type_(type) {}
const FunctionOptionsType* options_type_;
};

ARROW_EXPORT void PrintTo(const FunctionOptions&, std::ostream*);

/// @}

} // namespace compute
} // namespace arrow
1 change: 1 addition & 0 deletions cpp/src/arrow/compute/kernels/scalar_if_else_benchmark.cc
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include "arrow/array/concatenate.h"
#include "arrow/array/util.h"
#include "arrow/compute/api_scalar.h"
#include "arrow/compute/function.h"
#include "arrow/testing/gtest_util.h"
#include "arrow/testing/random.h"
#include "arrow/util/key_value_metadata.h"
Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/compute/kernels/vector_rank.cc
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
// specific language governing permissions and limitations
// under the License.

#include "arrow/compute/function.h"
#include "arrow/compute/kernels/vector_sort_internal.h"
#include "arrow/compute/registry.h"

Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/compute/kernels/vector_replace_benchmark.cc
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include <benchmark/benchmark.h>

#include "arrow/array.h"
#include "arrow/datum.h"
#include "arrow/testing/gtest_util.h"
#include "arrow/testing/random.h"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include "arrow/array/validate.h"
#include "arrow/builder.h"
#include "arrow/compute/api_vector.h"
#include "arrow/datum.h"
#include "arrow/testing/gtest_util.h"
#include "arrow/type_fwd.h"
#include "arrow/util/logging.h"
Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/compute/kernels/vector_select_k.cc
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

#include <queue>

#include "arrow/compute/function.h"
#include "arrow/compute/kernels/vector_sort_internal.h"
#include "arrow/compute/registry.h"

Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/compute/kernels/vector_sort.cc
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

#include <unordered_set>

#include "arrow/compute/function.h"
#include "arrow/compute/kernels/vector_sort_internal.h"
#include "arrow/compute/registry.h"

Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/compute/registry_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
#include <gtest/gtest.h>

#include "arrow/compute/function.h"
#include "arrow/compute/function_options.h"
#include "arrow/compute/registry.h"
#include "arrow/result.h"
#include "arrow/status.h"
Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/compute/type_fwd.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ struct TypeHolder;
namespace compute {

class Function;
class ScalarAggregateFunction;
class FunctionExecutor;
class FunctionOptions;
class FunctionRegistry;
Expand Down

0 comments on commit cf44793

Please sign in to comment.