-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distsql: pre-reserve memory needed to mark rows in HashJoiner build phase #18838
Conversation
I'm not super happy with this PR, because it seems like a little more refactoring than absolutely necessary, but you should take my opinion with a grain of salt, because I'm working with a healthy amount of 🐶 and am not sure I can articulate clearly whether you can make this PR more minimal. In either case, great job on fixing this bug. I'd like @andreimatei to weigh in before we merge this. Reviewed 3 of 3 files at r1. pkg/sql/distsqlrun/hashjoiner.go, line 198 at r1 (raw file):
Why rope in so much code-move refactoring into this commit (moving some amount of logic into pkg/sql/distsqlrun/hashjoiner.go, line 305 at r1 (raw file):
s/ Comments from Reviewable |
Thanks for the review. I definitely thought about the amount of refactoring while writing this change but I think that it's small enough to be OK. The issue stems from the fact that I blurred the line between the buffer and build phases in a previous change: we build from disk if we encounter a memory error in the buffer phase and then in the level above, build from memory if we did not encounter a memory error. The solution to this bug requires us to have a memHashRowContainer built so that we can pre-reserve the memory and fall back to disk if anything goes wrong. Thus the memHashRowContainer must be built before the diskHashRowContainer. This means either moving the in memory build phase to precede the disk build phase in the buffer phase or extract the disk build phase into a function that is called after building in-memory and where it currently is. I decided to clean up my previous mistake and instead move the disk build phase out of the buffer phase and put the build phase (of both memory and disk) behind a function. I think the code is better off this way and the change is small/localized enough that it will not introduce cherrypicking woes. I have no issue with going with the other solution though so please let me know what you think. |
I think the refactoring was quite necessary (and it's good we're cherry-picking it). Ideally I think it would have been better to do it in a different commit than the memory reservation change, but I shouldn't be the one to cast that stone :) Reviewed 1 of 3 files at r1. pkg/sql/distsqlrun/hash_row_container.go, line 122 at r1 (raw file):
Is the undefined behavior described by this note still necessary given that you're seemingly taking away the reason for it? pkg/sql/distsqlrun/hash_row_container.go, line 228 at r1 (raw file):
if the pkg/sql/distsqlrun/hash_row_container.go, line 229 at r1 (raw file):
nit: invert the condition and return early so you can unident the bulk of this function. pkg/sql/distsqlrun/hash_row_container.go, line 286 at r1 (raw file):
do we expect for this call to ever not be a no-op? If no, I'd find a way to assert that memory has already been reserved. pkg/sql/distsqlrun/hashjoiner.go, line 210 at r1 (raw file):
I find everything around the use of this pkg/sql/distsqlrun/hashjoiner.go, line 223 at r1 (raw file):
Shouldn't this be moved above the call to pkg/sql/distsqlrun/hashjoiner.go, line 298 at r1 (raw file):
nit: hint to h.storedSide pkg/sql/distsqlrun/hashjoiner.go, line 303 at r1 (raw file):
as opposed to what? Consider giving a hint as to why this would ever be set to false - I guess it's when we already happen to know that there's no memory available. Then again, why not let the code always try to use memory and fallback? Final nit: if we keep the arg, I'd suggest we name them pkg/sql/distsqlrun/hashjoiner.go, line 324 at r1 (raw file):
👍 pkg/sql/distsqlrun/hashjoiner.go, line 403 at r1 (raw file):
It doesn't "attempt to read", it reads :) pkg/sql/distsqlrun/hashjoiner.go, line 405 at r1 (raw file):
Is the last sentence finished? Comments from Reviewable |
…hase A situation was uncovered by #18600, where the HashJoiner would run out of memory in the probe phase. This was because we had made an assumption that we wouldn't hit a memory limit if the buffer phase filled up at most 2/3 of the limit with both streams , since the marking infrastructure would take up only a fraction of 1/3 (the chosen stream). This assumption failed to take into account other limits shared with other processors. This change pre-reserves the memory needed for the probe phase in the build phase so that we can keep a single point in the code where we fall back to disk while not relying on any limit assumptions.
Review status: 1 of 3 files reviewed at latest revision, 13 unresolved discussions. pkg/sql/distsqlrun/hash_row_container.go, line 122 at r1 (raw file): Previously, andreimatei (Andrei Matei) wrote…
The only reason there is undefined behavior is because it is not necessary to support adding more rows once we start marking. It is not necessary behavior, simply not worth time. pkg/sql/distsqlrun/hash_row_container.go, line 228 at r1 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/sql/distsqlrun/hash_row_container.go, line 229 at r1 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/sql/distsqlrun/hash_row_container.go, line 286 at r1 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/sql/distsqlrun/hashjoiner.go, line 198 at r1 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Explained in main thread. pkg/sql/distsqlrun/hashjoiner.go, line 210 at r1 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/sql/distsqlrun/hashjoiner.go, line 223 at r1 (raw file): Previously, andreimatei (Andrei Matei) wrote…
This needs to be after pkg/sql/distsqlrun/hashjoiner.go, line 298 at r1 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/sql/distsqlrun/hashjoiner.go, line 303 at r1 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Updated comment. Since we already have knowledge of whether we should fall back to disk or not (disregarding marking), why not use that knowledge? I prefer to keep this as it is. re assertions I'm not sure what you mean. That we assert that Renamed pkg/sql/distsqlrun/hashjoiner.go, line 305 at r1 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Renamed to pkg/sql/distsqlrun/hashjoiner.go, line 403 at r1 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/sql/distsqlrun/hashjoiner.go, line 405 at r1 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Yes, updated. Comments from Reviewable |
A situation was uncovered by #18600, where the HashJoiner would run out
of memory in the probe phase. This was because we had made an assumption
that we wouldn't hit a memory limit if the buffer phase filled up at
most 2/3 of the limit with both streams , since the marking
infrastructure would take up only a fraction of 1/3 (the chosen stream).
This assumption failed to take into account other limits shared with
other processors. This change pre-reserves the memory needed for the
probe phase in the build phase so that we can keep a single point in the
code where we fall back to disk while not relying on any limit
assumptions.