Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multistage] partial operator chain execution #9711

Merged
merged 1 commit into from
Nov 5, 2022

Conversation

agavra
Copy link
Contributor

@agavra agavra commented Nov 2, 2022

See the design doc for a big picture view.

This is the first PR in a series of PRs to improve our execution model. It implements "partial execution" of operator chains by allowing them to return a "noop" MetadataBlock in the scenario where there is either no data to process or no data to output.

This PR is a non-functional change because the WorkerQueryExecutor doesn't actually take advantage of the partial execution ability - it just calls operator#nextBlock whenever it processes a noop block.

PR Review Guide

This PR is broken into 3 functional commits, which I recommend you review in order:

  1. first commit supports different types of MetadataBlocks - in the past there were two: EOS block and ERROR block and they were differentiated by the presence of the exception map. The first commit uses the variable bytes data in BaseDataBlock to encode a JSON object with additional metadata. It's a little hacky, but in the grand scheme of things it's very localized and allows us to have a lot of flexibility in how we use the metadata blocks going forward and maintains backwards compatibility with the existing code. Specifically, this is used to introduce a NOOP metadata block type that will be used to signal to the future scheduler that the task has completed the processing that it can do at the moment.
  2. This pipes the noop metadata blocks up an operator chain. There are two types of operators that produce noop metadata blocks: (a) the MailboxReceiveOperator when it has nothing available in its mailboxes and (b) stateful operators such as Sort/HashJoin that process a single block without producing anything (they need to process all blocks before producing).
  3. This commit adds testing and fixes up some things for MetadataBlock - specifically making sure that it is backwards compatible with the existing code.

Testing

We don't currently have any tests for the operators in the multistage engine, so it was tough to add that into this PR. I will follow this one up with one dedicated to testing operators.

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks mostly good. although the encoding part seems less than elegant but i don't have a better solution.

Comment on lines 122 to 127

if (!_readyToConstruct) {
return TransferableBlockUtils.getNoOpTransferableBlock();
}

return produceAggregatedBlock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we can wrapped around these with a BaseTransferrableOperator than handles metadata block (NOOP, ERROR, not sure if possible for EOS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

se comment below, it isn't straightforward to make this clean (there isn't even a clean way to get the child block because you might have to know whether to read from left/right in case of a join or whether to read at all in the case that the operator is currently a 0->1 operator)

Comment on lines 140 to 142
if (transferableBlock.isNoOpBlock()) {
continue;
} else if (transferableBlock.isEndOfStreamBlock()) {
return resultDataBlocks;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be after null check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, we shouldn't be adding noops/eos to the results - right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in transferableBlock.isNoOpBlock() it checks metadataBlock.getType() == MetadataBlock.MetadataBlockType.NOOP. however metadataBlock can be null. which will throw NPE, no?

should we have a null checker? transferableBlock.getDataBlock() != null ?

previously the null check is not necessary b/c it only look at the BaseDataBlock.Type _type member variable which cannot be null in TransferableBlock

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, looking at the code I don't think TransferableBlock#getDataBlock can ever return null - if the field is null, it either builds the block or throws an exception, otherwise it returns the field.

I'll remove this check altogether since I think it's just misleading. I still think we shouldn't be adding to the result table on metadata blocks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +127 to +132
// if type is null, then we're reading a legacy block where we didn't encode any
// data. assume that it is an EOS block if there's no exceptions and an ERROR block
// otherwise
return type == null
? (getExceptions().isEmpty() ? MetadataBlockType.EOS : MetadataBlockType.ERROR)
: MetadataBlockType.valueOf(type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need to consider backward compatibility here. --> if the type is null, that means it is a legacy block. you will throw when deserializing anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was very careful to make sure that it won't throw when deserializing (see the tests). is there any reason why we don't need to consider backwards compatibility? if I don't then I can be less hacky in the serialization format!

Copy link
Contributor

@walterddr walterddr Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please let me know if my understanding is correct. if the metadata block transferred over the wire is of previous version. then using the current version of the code it cannot reconstruct a metadata block back from the byteBuffer (as it will not be encoded using jackson).

in that case we will never reach a situation where the byteBuffer is decodable, and type is null. correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the metadata block transferred over the wire is of previous version. then using the current version of the code it cannot reconstruct a metadata block back from the byteBuffer

that's not correct - it can decode the byteBuffer, the only difference is that it will read it with an empty _variableBytesData, which will mean the JSON contents will be empty. (see the MetadataBlockTest)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah... ok. yeah in that case we should keep this. thanks for the explanation

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good. got some minor comments

// all the mailboxes we opened returned null but were not yet closed - early terminate
// with a noop block. Otherwise, we have exhausted all data from all mailboxes and can
// return EOS
return openMailboxCount > 0 && (openMailboxCount != eosCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is a bit hard for me to validate. can't we just do openMailboxCount > 0?
IIUC, the last one is only for when you exactly close a mailbox afterwards and save another call to the getNextBlock() only to return an EOS, yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this condition isn't necessary, it's technically an optimization to avoid needing another call. I'll remove it (at first I thought it was necessary, but it was actually a different bug that I was figuring out)

Comment on lines 140 to 142
if (transferableBlock.isNoOpBlock()) {
continue;
} else if (transferableBlock.isEndOfStreamBlock()) {
return resultDataBlocks;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in transferableBlock.isNoOpBlock() it checks metadataBlock.getType() == MetadataBlock.MetadataBlockType.NOOP. however metadataBlock can be null. which will throw NPE, no?

should we have a null checker? transferableBlock.getDataBlock() != null ?

previously the null check is not necessary b/c it only look at the BaseDataBlock.Type _type member variable which cannot be null in TransferableBlock

@agavra
Copy link
Contributor Author

agavra commented Nov 4, 2022

@walterddr just wanted to make sure you don't merge until tests pass, looks like the last commit actually introduced some regressions... double checking that now

@codecov-commenter
Copy link

codecov-commenter commented Nov 4, 2022

Codecov Report

Merging #9711 (2134a21) into master (4b36685) will increase coverage by 41.99%.
The diff coverage is 85.14%.

@@              Coverage Diff              @@
##             master    #9711       +/-   ##
=============================================
+ Coverage     28.06%   70.05%   +41.99%     
- Complexity       53     4980     +4927     
=============================================
  Files          1939     1951       +12     
  Lines        104163   104561      +398     
  Branches      15792    15836       +44     
=============================================
+ Hits          29231    73252    +44021     
+ Misses        72068    26180    -45888     
- Partials       2864     5129     +2265     
Flag Coverage Δ
integration1 25.34% <9.40%> (+0.01%) ⬆️
integration2 24.42% <0.00%> (-0.19%) ⬇️
unittests1 67.56% <85.14%> (?)
unittests2 15.66% <71.28%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ot/query/runtime/executor/WorkerQueryExecutor.java 92.30% <66.66%> (+92.30%) ⬆️
...ot/query/runtime/operator/MailboxSendOperator.java 88.42% <66.66%> (+88.42%) ⬆️
...rg/apache/pinot/query/service/QueryDispatcher.java 81.44% <66.66%> (+81.44%) ⬆️
...query/runtime/operator/MailboxReceiveOperator.java 78.94% <79.16%> (+78.94%) ⬆️
...g/apache/pinot/common/datablock/MetadataBlock.java 79.41% <80.64%> (+33.95%) ⬆️
...pinot/query/runtime/operator/HashJoinOperator.java 81.69% <82.85%> (+81.69%) ⬆️
...inot/query/runtime/operator/TransformOperator.java 79.31% <83.33%> (+79.31%) ⬆️
...e/pinot/query/runtime/operator/FilterOperator.java 60.00% <87.50%> (+60.00%) ⬆️
...che/pinot/query/runtime/operator/SortOperator.java 86.95% <93.33%> (+86.95%) ⬆️
...inot/query/runtime/operator/AggregateOperator.java 85.18% <95.12%> (+85.18%) ⬆️
... and 1347 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@agavra agavra force-pushed the partial_execution branch from 3ef8734 to 2134a21 Compare November 5, 2022 17:01
@agavra
Copy link
Contributor Author

agavra commented Nov 5, 2022

had to rebase after #9729 and #9676 - @walterddr the changes are in AggregateOperator HashJoinOperator and the corresponding tests if you want to review just those files.

@walterddr walterddr merged commit aa013a4 into apache:master Nov 5, 2022
@agavra agavra deleted the partial_execution branch November 5, 2022 20:15
@walterddr walterddr added the multi-stage Related to the multi-stage query engine label Nov 15, 2022
walterddr pushed a commit that referenced this pull request Nov 15, 2022
…9753)

This is a follow-up to #9711 and follows the design outlined in [this design doc](https://docs.google.com/document/d/1XAMHAlhFbINvX-kK1ANlzbRz4_RkS0map4qhqs1yDtE/edit#heading=h.de4smgkh3bzk).

This PR implements a round robin operator chain scheduling algorithm and sets up the interface for future PRs that will implement more advanced scheduling. As of this PR, we can be guaranteed that all queries will make progress (see the change in `pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/SSBQueryIntegrationTest.java`, you can now run it under situations with only 2 cores available) but the algorithm is still very hungry for CPU (queries with nothing in their mailbox will still be scheduled).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-stage Related to the multi-stage query engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants