-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(db): Fix root cause of RocksDB misbehavior #301
fix(db): Fix root cause of RocksDB misbehavior #301
Conversation
In hindsight, the issue was obvious; recent RocksDB logs are littered with "Shutdown: canceling all background work" logs, which I couldn't attribute previously. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #301 +/- ##
==========================================
- Coverage 35.63% 35.58% -0.06%
==========================================
Files 520 520
Lines 28350 28359 +9
==========================================
- Hits 10103 10092 -11
- Misses 18247 18267 +20
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hoped that the PR would be titled RocksDB Wars: Episode IV -- The Stalled Writes Strike Back
Nice and certainly makes sense. If this works out, would it make sense to roll back the new code added in the previous 3 PRs? Less code is generally better than more code.
@popzxc We have write stalls reported for the state keeper cache, so I think we should leave this logic as-is for now. New RocksDB metrics also make sense, IMO; for one, they'll allow to estimate whether this fix works. |
0adf69b
to
ef461d6
Compare
Merge queue setting changed
🤖 I have created a release *beep* *boop* --- ## [16.2.0](core-v16.1.0...core-v16.2.0) (2023-10-26) ### Features * **basic_witness_producer_input:** Add Basic Witness Producer Input component ([#156](#156)) ([3cd24c9](3cd24c9)) * **core:** adding pubdata to statekeeper and merkle tree ([#259](#259)) ([1659c84](1659c84)) ### Bug Fixes * **db:** Fix root cause of RocksDB misbehavior ([#301](#301)) ([d6c30ab](d6c30ab)) * **en:** gracefully shutdown en waiting for reorg detector ([#270](#270)) ([f048485](f048485)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
What ❔
Fixes (hopefully) the root cause of recent RocksDB misbehavior. After recent refactoring, it improperly cancels background compactions / flushes each time a
RocksDB
instance is dropped, which leads to compactions / flushes being constantly interrupted. (After refactoring,RocksDB
instances areClone
eable and act essentially asArc
wrappers; a new instance is created and dropped, in particular, when processing each chunk of L1 batches.) A proper solution, implemented by this PR, is to cancel background work when dropping an internal, non-Arc
'd DB instance.Why ❔
That's a bug that makes RocksDB behave very unstably, as witnessed by recent issues with it.
Checklist
zk fmt
andzk lint
.