-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DELETE FROM ... [returning nothing] crashes a node #17921
Comments
I cannot even remove a half a day. |
I have tried to remove data for an one hour but I got |
It is possible to remove data by ten-minutes chunks :-( |
Hey @AnyCPU, you're running into the fact that as of the time of writing, support for large writing transactions is not ideal in CockroachDB. You've pretty much experienced all the problems: operations never succeeding, nodes running into memory troubles (though that one's somewhat unexpected), or failing with the "too large to commit" error. While deleting in small chunks is currently the best option, rest assured that we're working on improving this (see the issue above). We have workarounds in place for the case in which you want to drop/truncate the whole table (which we can do more efficiently by essentially swapping out the table with a new one), but that's clearly not going to help you. Tracking issue: #15849 |
Thank you @tschottdorf |
Despite many improvements, this is still an issue in |
@spencerkimball let's move the discussion from #22876 here. I agree that the easiest way to diagnose this is to get heap profiles. @petermattis if I were to introduce such a facility into |
For heap profiles, you can already do |
Also, when diagnosing, it is easiest to create a cluster using |
No, that's what I mean. Forgot about the env var, thanks! |
In the absence of a fast path deletion, `DELETE` would generate one potentially giant batch and OOM the gateway node. This became obvious quickly via heap profiling. Added chunking of the deletions to `tableDeleter`. SQL folks may have stronger opinions on how to achieve this, or a better idea of a preexisting chunking mechanism that works more reliably. If nothing else, this change serves as a prototype to fix cockroachdb#17921. With this change, `roachtest run drop` works (as in, it doesn't out-of-memory right away; the run takes a long time so I can't yet confirm that it actually passes). Release note (sql change): deleting many rows at once now consumes less memory.
With #22991, this seems to be making steady progress: Remains to be seen whether it manages to commit. |
Ok, as expected something did go wrong. I think after approximately 10 minutes we run into the timestamp cache and catch a retry and stagnate from then on: /debug/requests shows little of use, the sql trace is huge since this line is extremely chatty and, well, because we're deleting ten million rows. I think what you'd want here is that the refresh machinery realizes that nothing has changed and so the restart can be hidden. But something is clearly going wrong but it's not exactly clear to me what. @spencerkimball, would you mind taking a look? This is really easy to run, just
|
In the absence of a fast path deletion, `DELETE` would generate one potentially giant batch and OOM the gateway node. This became obvious quickly via heap profiling. Added chunking of the deletions to `tableDeleter`. SQL folks may have stronger opinions on how to achieve this, or a better idea of a preexisting chunking mechanism that works more reliably. If nothing else, this change serves as a prototype to fix cockroachdb#17921. With this change, `roachtest run drop` works (as in, it doesn't out-of-memory right away; the run takes a long time so I can't yet confirm that it actually passes). Release note (sql change): deleting many rows at once now consumes less memory.
In the absence of a fast path deletion, `DELETE` would generate one potentially giant batch and OOM the gateway node. This became obvious quickly via heap profiling. Added chunking of the deletions to `tableDeleter`. SQL folks may have stronger opinions on how to achieve this, or a better idea of a preexisting chunking mechanism that works more reliably. If nothing else, this change serves as a prototype to fix cockroachdb#17921. With this change, `roachtest run drop` works (as in, it doesn't out-of-memory right away; the run takes a long time so I can't yet confirm that it actually passes). Release note (sql change): deleting many rows at once now consumes less memory.
Reopening as there are still problems with such deletions. They should either fail gracefully or succeed; not hang indefinitely. |
Reopening to verify fix. |
Ah, already done: #23258 (comment) |
A cluster has 3 nodes running on Cockroachdb 1.0.5 (Linux, 64bit)
A db has about 195 millions of records.
Scheme is
The db has data of an one month.
I want to delete data starting from second day of month, so
I run either
delete from stats where ts > '2017-01-01 23:59:59'::timestamp;
ordelete from stats where ts > '2017-01-01 23:59:59'::timestamp returning nothing;
.I use the
cockroach sql --insecure --host = ...
command to run my query.After some minutes an used node dies.
A db driver returns - bad connection.
And there are a lot of messages like that
context canceled while in command queue: ResolveIntent...
in the log.All nodes have free space.
Do I have to delete data by chunks?
Thanks.
The text was updated successfully, but these errors were encountered: