Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: fix external sort #46052

Merged
merged 1 commit into from
Mar 13, 2020
Merged

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Mar 12, 2020

Release justification: bug fixes and low-risk updates to new
functionality.

This commit fixes the resetting behavior of inputPartitioningOperator
(used by the external sort to figure out when to start a new partition
when spilling the input) that was recently broken when we added the
fallback from the external hash joiner to external sort plus merge
joiner. The complication here is that we need two kinds of behaviors
when inputPartitioningOperator is being reset:

  1. ("shallow" reset) we need to clear the memory account because the
    external sorter is moving on spilling the data into a new partition.
    However, we cannot propagate the reset further up because it might
    delete the data that the external sorter has not yet spilled. This
    behavior is needed in externalSorter when resetting the in-memory sorter
    when spilling the next "chunk" of data into the new partition.
  2. ("deep" reset) we need to do the full reset of the whole chain of
    operators. This behavior is needed when the whole external sorter is
    being reset.

Fixes: #46033.
Fixes: #46061.

Release note: None.

@yuzefovich yuzefovich requested review from asubiotto and a team March 12, 2020 20:51
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich
Copy link
Member Author

In order for the problem of "losing" the data to actually happen there are a couple of preconditions:

  1. the pure in-memory sorter (i.e. not the one used internally by the external sorter) spools more data than memory limit allows (this can happen since we first buffer up tuples and only then account for the memory in question)
  2. the memory account in inputPartitioningOperator "reaches" the memory limit before everything has been exported from the pure in-memory sorter.

I was able to reproduce the problem pretty quickly in tests with memoryLimit=40KiB and input's BatchSize=5, however, in the real-world scenario I think it's quite unlikely to hit this problem. Anyway, I'm glad that our randomized tests caught this issue pretty quickly.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hitting this pretty easily in my #45892 PR so that's a good sign.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


pkg/sql/colexec/external_sort.go, line 429 at r1 (raw file):

func (o *inputPartitioningOperator) reset() {
	if o.propagateReset {

Intuitively, I would expect deep resets to be the default, which it currently isn't. How would it look if we switch it around?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/colexec/external_sort.go, line 429 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Intuitively, I would expect deep resets to be the default, which it currently isn't. How would it look if we switch it around?

Yeah, I agree this way is nicer, done.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colexec/external_sort.go, line 429 at r1 (raw file):

Previously, yuzefovich wrote…

Yeah, I agree this way is nicer, done.

Maybe rename to interceptReset or something similar to avoid the double negative. i.e. if !interceptReset { reset operator tree}

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/colexec/external_sort.go, line 429 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Maybe rename to interceptReset or something similar to avoid the double negative. i.e. if !interceptReset { reset operator tree}

Done.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


pkg/sql/colexec/external_sort.go, line 366 at r3 (raw file):

	// interceptReset determines whether the reset method will be called on
	// the input to this operator when the latter is being reset. This field is
	// managed by externalSorter.

Sorry, I missed this. Add that the field is reset by the inputPartitioningOperator after a reset.

Release justification: bug fixes and low-risk updates to new
functionality.

This commit fixes the resetting behavior of inputPartitioningOperator
(used by the external sort to figure out when to start a new partition
when spilling the input) that was recently broken when we added the
fallback from the external hash joiner to external sort plus merge
joiner. The complication here is that we need two kinds of behaviors
when inputPartitioningOperator is being reset:
1. ("shallow" reset) we need to clear the memory account because the
external sorter is moving on spilling the data into a new partition.
However, we *cannot* propagate the reset further up because it might
delete the data that the external sorter has not yet spilled. This
behavior is needed in externalSorter when resetting the in-memory sorter
when spilling the next "chunk" of data into the new partition.
2. ("deep" reset) we need to do the full reset of the whole chain of
operators. This behavior is needed when the whole external sorter is
being reset.

Release note: None.
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/colexec/external_sort.go, line 366 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Sorry, I missed this. Add that the field is reset by the inputPartitioningOperator after a reset.

Done.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)

@yuzefovich
Copy link
Member Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 13, 2020

Build succeeded

@craig craig bot merged commit 0701729 into cockroachdb:master Mar 13, 2020
@yuzefovich yuzefovich deleted the fix-external-sort branch March 13, 2020 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql/colexec: TestExternalSortRandomized failed colexec: TestExternalSortRandomized failed
3 participants