-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky test - TestP2PWithMultipleDocumentUpdatesPerNodeWithP2PCollection #1154
Comments
I think this might be caused by I'll close this ticked if/when my rework gets merged, as the issue appears to be solved there. |
Issue still persists in #1160 - I got a single failure out of 50 runs (locally). Same test ( |
Just noting for future reference that it feels like this is getting worse and failures more frequent, although I do not recall any changes being merged within the last few weeks that should notably affect P2P or their tests. |
Possibly the same issue as #1252 |
Possibly relevant - libP2P seems to have a significant test flakiness problem, we might not be able to fix our test issues in our codebase: https://github.com/libp2p/go-libp2p/issues?q=is%3Aissue+is%3Aopen+flaky |
I'm not surprised given the nature of p2p systems. |
Our go-libp2p version is currently 0.26.0, which does not include libp2p/go-libp2p#2173 - a PR that seems to have notably reduced the flakiness of their tests, and involves a sub-package logged quite heavily in at least a few of our flaky test failures (e.g. https://github.com/sourcenetwork/defradb/actions/runs/4556339760/jobs/8036565210) EDIT: PR to bump the version: #1257 |
current theory - it is deadlocking with the |
The vast majority of processes are stuck on that one lock, most are ds.Get calls, there is one txn.Put, and one ds.Close. ds.Close is the only non-read lock that I can spot. txn and ds share the same lock. Interestingly the |
This might be caused by multiple RLocks in the same routine prior to unlocking. ds Put and Delete call RLock, then call txn.Commit which tries to re-aquire the same mutex via RLock (before either unlock via defer). The documentation for RWMutex explicitly warns against doing this:
|
Failure in ci run: https://github.com/sourcenetwork/defradb/actions/runs/4295741739/jobs/7486567708
Waiting for pushlog timed out:
The text was updated successfully, but these errors were encountered: