-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: disk_row_container invalid row length 0, expected 2 when running TPC-H query 8 #18600
Comments
Have successfully reproduced this panic and will look into it. It seems that a nil row is being passed in to the disk row container during creation which should never happen, since this row is supposedly the row that caused the OOM error in the in-memory row container. |
The issue is that we get an OOM error from This is a bug and I will fix it but the reason this hasn't been caught before is that the |
Correction: it is the The reason we now run out of memory is due to e12ba3c. The memory limit that we hit is the In the Initially, running out of memory in the probe phase was handled but I reverted to this suggested solution because the code was simpler. I can add the probe phase handling back in, any objections? cc @cockroachdb/distsql |
Alberto has been explaining me two times this week that we should architecture hash joins differently: we should ensure the code is able to work with arbitrary little memory. The way to do that is called "segmented hash joins". If you think that will be necessary to durably address our concerns, we could add it to our roadmap. |
Simpler change for 1.1 would be to preallocate memory used in the probe phase so will go ahead with this. Segmented hash joins are interesting but some infrastructure work is needed. Let's talk about this at the next meeting. |
When running TPC-H's query 8 with a scale-factor of 1 (with default options everywhere else)
on a 3-node cluster locally with environment variables
COCKROACH_PROPOSER_EVALUATED_KV=true COCKROACH_ENTERPRISE_ENABLED=true
(and with a CCL license):one of the nodes panic and fail
The TPC-H tables were loaded via the
RESTORE
feature from Azure backups as outlined in backup option (1) here https://github.com/cockroachdb/loadgen/tree/master/tpch.cc: @vivekmenezes
Additional info
The text was updated successfully, but these errors were encountered: