distsql: pre-reserve memory needed to mark rows in HashJoiner build phase #18838

asubiotto · 2017-09-27T20:00:14Z

A situation was uncovered by #18600, where the HashJoiner would run out
of memory in the probe phase. This was because we had made an assumption
that we wouldn't hit a memory limit if the buffer phase filled up at
most 2/3 of the limit with both streams , since the marking
infrastructure would take up only a fraction of 1/3 (the chosen stream).

This assumption failed to take into account other limits shared with
other processors. This change pre-reserves the memory needed for the
probe phase in the build phase so that we can keep a single point in the
code where we fall back to disk while not relying on any limit
assumptions.

cockroach-teamcity · 2017-09-27T20:00:22Z

This change is

rjnn · 2017-09-28T17:01:42Z

I'm not super happy with this PR, because it seems like a little more refactoring than absolutely necessary, but you should take my opinion with a grain of salt, because I'm working with a healthy amount of 🐶 and am not sure I can articulate clearly whether you can make this PR more minimal. In either case, great job on fixing this bug. I'd like @andreimatei to weigh in before we merge this.

Reviewed 3 of 3 files at r1.
Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.

pkg/sql/distsqlrun/hashjoiner.go, line 198 at r1 (raw file):

	defer h.rows[rightSide].Close(ctx)

	var storedRows hashRowContainer

Why rope in so much code-move refactoring into this commit (moving some amount of logic into buildPhase?) It's not entirely clear to me why it's necessary.

pkg/sql/distsqlrun/hashjoiner.go, line 305 at r1 (raw file):

//    in-memory hashRowContainer representation.
func (h *hashJoiner) buildPhase(
	ctx context.Context, useTempStorage bool, buildMemory bool,

s/buildMemory/buildInMemory/ makes this clearer, I was confused momentarily.

Comments from Reviewable

asubiotto · 2017-09-28T18:59:18Z

Thanks for the review. I definitely thought about the amount of refactoring while writing this change but I think that it's small enough to be OK. The issue stems from the fact that I blurred the line between the buffer and build phases in a previous change: we build from disk if we encounter a memory error in the buffer phase and then in the level above, build from memory if we did not encounter a memory error.

The solution to this bug requires us to have a memHashRowContainer built so that we can pre-reserve the memory and fall back to disk if anything goes wrong. Thus the memHashRowContainer must be built before the diskHashRowContainer. This means either moving the in memory build phase to precede the disk build phase in the buffer phase or extract the disk build phase into a function that is called after building in-memory and where it currently is. I decided to clean up my previous mistake and instead move the disk build phase out of the buffer phase and put the build phase (of both memory and disk) behind a function. I think the code is better off this way and the change is small/localized enough that it will not introduce cherrypicking woes.

I have no issue with going with the other solution though so please let me know what you think.

andreimatei · 2017-09-28T20:09:22Z

I think the refactoring was quite necessary (and it's good we're cherry-picking it). Ideally I think it would have been better to do it in a different commit than the memory reservation change, but I shouldn't be the one to cast that stone :)

Reviewed 1 of 3 files at r1.
Review status: all files reviewed at latest revision, 13 unresolved discussions, all commit checks successful.

pkg/sql/distsqlrun/hash_row_container.go, line 122 at r1 (raw file):

// NOTE: Once a row is marked, adding more rows to the hashMemRowContainer
// results in undefined behavior. It is not necessary to do otherwise for the
// current usage of hashMemRowContainer and allows us to assume that a memory

Is the undefined behavior described by this note still necessary given that you're seemingly taking away the reason for it?

pkg/sql/distsqlrun/hash_row_container.go, line 228 at r1 (raw file):

// memory account by the memory needed to mark all rows. It is a noop if
// h.markMemoryReserved is true.
func (h *hashMemRowContainer) reserveMarkMemory(ctx context.Context) error {

if the !h.markMemoryReserved condition is important, I'd rename this to reserveMarkMemoryMaybe()

pkg/sql/distsqlrun/hash_row_container.go, line 229 at r1 (raw file):

// h.markMemoryReserved is true.
func (h *hashMemRowContainer) reserveMarkMemory(ctx context.Context) error {
	if !h.markMemoryReserved {

nit: invert the condition and return early so you can unident the bulk of this function.

pkg/sql/distsqlrun/hash_row_container.go, line 286 at r1 (raw file):

	}
	if i.marked == nil {
		if err := i.reserveMarkMemory(ctx); err != nil {

do we expect for this call to ever not be a no-op? If no, I'd find a way to assert that memory has already been reserved.

pkg/sql/distsqlrun/hashjoiner.go, line 210 at r1 (raw file):

	// attempt to build an in-memory representation, falling back to disk
	// according to useTempStorage.
	storedRows, earlyExit, err := h.buildPhase(ctx, useTempStorage, err == nil /* buildMemory */)

I find everything around the use of this err very hard to read. The fact that err != nilat this point implies that it's a OOM error is quite obscure, and also the condition above for calling DrainAndClose() is a bit inscrutable.
I suggest introducing a oom bool above.

pkg/sql/distsqlrun/hashjoiner.go, line 223 at r1 (raw file):

	defer storedRows.Close(ctx)

	// Add the row that caused a memory error in the buffer phase if there is

Shouldn't this be moved above the call to buildPhase()? Seems very randomly placed here.

pkg/sql/distsqlrun/hashjoiner.go, line 298 at r1 (raw file):

// buildPhase constructs our internal hash map of rows seen. This is done
// entirely from one stream (chosen during initial buffering).

nit: hint to h.storedSide

pkg/sql/distsqlrun/hashjoiner.go, line 303 at r1 (raw file):

//    storage if necessary.
//  - buildMemory specifies whether the build phase should attempt to build an
//    in-memory hashRowContainer representation.

as opposed to what? Consider giving a hint as to why this would ever be set to false - I guess it's when we already happen to know that there's no memory available. Then again, why not let the code always try to use memory and fallback?
Also, if we keep the arg, we should document and assert that useTempStorage is set when this is set.

Final nit: if we keep the arg, I'd suggest we name them attemptMemoryBuild and fallbackToDiskBuild.

pkg/sql/distsqlrun/hashjoiner.go, line 324 at r1 (raw file):

		// in-memory rows in this phase to not have to worry about hitting a
		// memory limit in the probe phase.
		err := storedMemRows.reserveMarkMemory(ctx)

👍

pkg/sql/distsqlrun/hashjoiner.go, line 403 at r1 (raw file):

bufferPhase attempts to read a portion

It doesn't "attempt to read", it reads :)

pkg/sql/distsqlrun/hashjoiner.go, line 405 at r1 (raw file):

// bufferPhase attempts to read a portion of both streams into memory (up to
// h.initialBufferSize) in the hope that one of them is small and should be used
// as h.storedSide. The phase attempts to consume all the rows from the chosen

Is the last sentence finished?

Comments from Reviewable

…hase A situation was uncovered by #18600, where the HashJoiner would run out of memory in the probe phase. This was because we had made an assumption that we wouldn't hit a memory limit if the buffer phase filled up at most 2/3 of the limit with both streams , since the marking infrastructure would take up only a fraction of 1/3 (the chosen stream). This assumption failed to take into account other limits shared with other processors. This change pre-reserves the memory needed for the probe phase in the build phase so that we can keep a single point in the code where we fall back to disk while not relying on any limit assumptions.

asubiotto · 2017-10-02T17:15:26Z

Review status: 1 of 3 files reviewed at latest revision, 13 unresolved discussions.

pkg/sql/distsqlrun/hash_row_container.go, line 122 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Is the undefined behavior described by this note still necessary given that you're seemingly taking away the reason for it?

The only reason there is undefined behavior is because it is not necessary to support adding more rows once we start marking. It is not necessary behavior, simply not worth time.

pkg/sql/distsqlrun/hash_row_container.go, line 228 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

if the !h.markMemoryReserved condition is important, I'd rename this to reserveMarkMemoryMaybe()

Done.

pkg/sql/distsqlrun/hash_row_container.go, line 229 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: invert the condition and return early so you can unident the bulk of this function.

Done.

pkg/sql/distsqlrun/hash_row_container.go, line 286 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

do we expect for this call to ever not be a no-op? If no, I'd find a way to assert that memory has already been reserved.

Done.

pkg/sql/distsqlrun/hashjoiner.go, line 198 at r1 (raw file):

Previously, arjunravinarayan (Arjun Narayan) wrote…

Why rope in so much code-move refactoring into this commit (moving some amount of logic into buildPhase?) It's not entirely clear to me why it's necessary.

Explained in main thread.

pkg/sql/distsqlrun/hashjoiner.go, line 210 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I find everything around the use of this err very hard to read. The fact that err != nilat this point implies that it's a OOM error is quite obscure, and also the condition above for calling DrainAndClose() is a bit inscrutable.
I suggest introducing a oom bool above.

Done.

pkg/sql/distsqlrun/hashjoiner.go, line 223 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Shouldn't this be moved above the call to buildPhase()? Seems very randomly placed here.

This needs to be after buildPhase since we are adding the row to the built container. Rewrote the comment.

pkg/sql/distsqlrun/hashjoiner.go, line 298 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: hint to h.storedSide

Done.

pkg/sql/distsqlrun/hashjoiner.go, line 303 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

as opposed to what? Consider giving a hint as to why this would ever be set to false - I guess it's when we already happen to know that there's no memory available. Then again, why not let the code always try to use memory and fallback?
Also, if we keep the arg, we should document and assert that useTempStorage is set when this is set.

Final nit: if we keep the arg, I'd suggest we name them attemptMemoryBuild and fallbackToDiskBuild.

Updated comment.

Since we already have knowledge of whether we should fall back to disk or not (disregarding marking), why not use that knowledge? I prefer to keep this as it is.

re assertions I'm not sure what you mean. That we assert that useTempStorage is true when buildMemory is true? It's a valid case to try to build in memory, try fall back to disk, and see that external storage is disabled.

Renamed buildMemory but I think fallbackToDiskBuild is more confusing than useTempStorage. This is what the boolean is called at the caller level as well (where we figure out whether we can use temp storage)

pkg/sql/distsqlrun/hashjoiner.go, line 305 at r1 (raw file):

Previously, arjunravinarayan (Arjun Narayan) wrote…

s/buildMemory/buildInMemory/ makes this clearer, I was confused momentarily.

Renamed to attemptMemoryBuild

pkg/sql/distsqlrun/hashjoiner.go, line 403 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

bufferPhase attempts to read a portion

It doesn't "attempt to read", it reads :)

Done.

pkg/sql/distsqlrun/hashjoiner.go, line 405 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Is the last sentence finished?

Yes, updated.

Comments from Reviewable

asubiotto requested review from andreimatei, rjnn, RaduBerinde and a team September 27, 2017 20:00

asubiotto added this to the 1.1 milestone Sep 28, 2017

asubiotto merged commit 3ec103a into cockroachdb:master Oct 3, 2017

asubiotto deleted the hashjoiner branch May 22, 2018 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distsql: pre-reserve memory needed to mark rows in HashJoiner build phase #18838

distsql: pre-reserve memory needed to mark rows in HashJoiner build phase #18838

asubiotto commented Sep 27, 2017

cockroach-teamcity commented Sep 27, 2017

rjnn commented Sep 28, 2017

asubiotto commented Sep 28, 2017

andreimatei commented Sep 28, 2017

asubiotto commented Oct 2, 2017

distsql: pre-reserve memory needed to mark rows in HashJoiner build phase #18838

distsql: pre-reserve memory needed to mark rows in HashJoiner build phase #18838

Conversation

asubiotto commented Sep 27, 2017

cockroach-teamcity commented Sep 27, 2017

rjnn commented Sep 28, 2017

asubiotto commented Sep 28, 2017

andreimatei commented Sep 28, 2017

asubiotto commented Oct 2, 2017