Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Introduce BaseProjectOperator and ValueBlock #10405

Merged
merged 1 commit into from
Mar 14, 2023

Conversation

Jackie-Jiang
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang commented Mar 10, 2023

Both Projection and Transform in pinot are SQL project operation.
This PR introduce the BaseProjectOperator base class that represent the executor for SQL project, which generates the ValueBlock.
The BaseProjectOperator is designed in a way that it can chain itself (take ValueBlock as both input and output).
With the change, pass-through transform is no longer needed because high level operator (e.g. selection/aggregation/group-by) can directly take the output of ProjectionOperator.
In the future, we may introduce more project operators (e.g. filter on transform, local join), and make flexible query plan to solve more complex queries.

This PR includes the following refactors:

  • Introduce BaseProjectOperator and make ProjectionOperator and TransformOperator extend it
  • Remove PassThrowTransformOperator
  • Introduce ProjectPlanNode and StarTreeProjectPlanNode that handle the project operator creation
  • Introduce ValueBlock interface as the result for ProjectOperator, and make ProjectionBlock and TransformBlock extend it
  • Introduce ColumnContext as the wrapper for column metadata and optional DataSource

Incompatible (compile)

Several interfaces are changed to use the new interface/class

@Jackie-Jiang Jackie-Jiang added enhancement incompatible Indicate PR that introduces backward incompatibility refactor labels Mar 10, 2023
@Jackie-Jiang Jackie-Jiang force-pushed the base_project_operator branch 3 times, most recently from 5f71c9a to 8bfe328 Compare March 11, 2023 01:58
@codecov-commenter
Copy link

codecov-commenter commented Mar 11, 2023

Codecov Report

Merging #10405 (9ea9261) into master (81ceb1d) will decrease coverage by 33.19%.
The diff coverage is 35.48%.

@@              Coverage Diff              @@
##             master   #10405       +/-   ##
=============================================
- Coverage     68.37%   35.19%   -33.19%     
+ Complexity     6096      282     -5814     
=============================================
  Files          2055     2053        -2     
  Lines        111389   111397        +8     
  Branches      16939    16939               
=============================================
- Hits          76167    39206    -36961     
- Misses        29799    68778    +38979     
+ Partials       5423     3413     -2010     
Flag Coverage Δ
integration1 24.47% <32.68%> (+0.16%) ⬆️
integration2 24.33% <33.61%> (?)
unittests1 ?
unittests2 13.87% <0.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...sform/function/BaseBinaryGeoTransformFunction.java 0.00% <0.00%> (-72.59%) ⬇️
.../transform/function/ConstructFromTextFunction.java 0.00% <0.00%> (-73.92%) ⬇️
...l/transform/function/ConstructFromWKBFunction.java 0.00% <0.00%> (-72.73%) ⬇️
...geospatial/transform/function/GeoToH3Function.java 0.00% <0.00%> (-83.34%) ⬇️
.../geospatial/transform/function/StAreaFunction.java 0.00% <0.00%> (-84.00%) ⬇️
...spatial/transform/function/StAsBinaryFunction.java 0.00% <0.00%> (-83.34%) ⬇️
...eospatial/transform/function/StAsTextFunction.java 0.00% <0.00%> (-77.78%) ⬇️
...spatial/transform/function/StContainsFunction.java 0.00% <0.00%> (-71.43%) ⬇️
...spatial/transform/function/StDistanceFunction.java 0.00% <0.00%> (-85.72%) ⬇️
...eospatial/transform/function/StEqualsFunction.java 0.00% <0.00%> (-80.00%) ⬇️
... and 108 more

... and 1153 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Jackie-Jiang Jackie-Jiang force-pushed the base_project_operator branch from 8bfe328 to 9ea9261 Compare March 13, 2023 21:49
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we consider make ExpressionDocIdSet also take columnContextMap instead of _dataSourceMap?

Copy link
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall. let's add more detail on the intellij refactor steps done to the PR description. looks like we renamed several classes/interfaces.

public void init(List<TransformFunction> arguments, Map<String, DataSource> dataSourceMap) {
Preconditions
.checkArgument(arguments.size() == 2, "2 arguments are required for transform function: %s", getName());
public void init(List<TransformFunction> arguments, Map<String, ColumnContext> columnContextMap) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably include ColumnContext in the PR description for the refactor/rename

}

// Build intermediate result block based on aggregation result from the executor
return new AggregationResultsBlock(_aggregationFunctions, aggregationExecutor.getResult());
}

@Override
public List<Operator> getChildOperators() {
return Collections.singletonList(_transformOperator);
public List<BaseProjectOperator<?>> getChildOperators() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API needs not to be change right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDE is showing warning because the generic type is not included

import org.apache.pinot.core.operator.blocks.ValueBlock;


public abstract class BaseProjectOperator<T extends ValueBlock> extends BaseOperator<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the generic here? seems like the usages are always ValueBlock instead of the generic extension of T right? any chance we want to use it differently going forward?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't right now, but I do want to make it more specific since ValueBlock is not a concrete class

@Jackie-Jiang Jackie-Jiang merged commit 01ff18d into apache:master Mar 14, 2023
@Jackie-Jiang Jackie-Jiang deleted the base_project_operator branch March 14, 2023 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement incompatible Indicate PR that introduces backward incompatibility refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants