Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not merge: WIP commits for debugging a production issue #16564

Closed

Conversation

bdarnell
Copy link
Contributor

I suspect that contention on this semaphore is causing failed
heartbeats in a production cluster.

This PR is based on release-1.0 so it can be tried immediately; if the experiment is successful I'll polish it up before merging and make a version for master.

I suspect that contention on this semaphore is causing failed
heartbeats in a production cluster.
@cockroach-teamcity
Copy link
Member

This change is Reviewable

bdarnell added 4 commits June 16, 2017 13:14
Note that because all of these keys are behind the GCThreshold, it
doesn't matter that we're not deleting them in FIFO order.
@bdarnell bdarnell changed the title storage: Use separate locks when updating our own liveness do not merge: WIP commits for debugging a production issue Jun 19, 2017
bdarnell added a commit to bdarnell/cockroach that referenced this pull request Jul 6, 2017
There's no reason to block our own liveness updates when incrementing
another node's epoch; doing so could cause cascading failures when
the liveness span gets slow.

This was originally suspected as the cause of cockroachdb#16565 (and was proposed
in cockroachdb#16564). That issue turned out to have other causes, but this
change seems like a good idea anyway.
@bdarnell
Copy link
Contributor Author

bdarnell commented Jul 6, 2017

The important commits in this branch have been broken out into separate PRs: the first commit is #16918 (master-only), the fourth is #16637 (master) and #16735 (release-1.0), and the fifth is #16632 (master) and #16739 (release-1.0).

@bdarnell bdarnell closed this Jul 6, 2017
bdarnell added a commit to bdarnell/cockroach that referenced this pull request Jul 6, 2017
There's no reason to block our own liveness updates when incrementing
another node's epoch; doing so could cause cascading failures when
the liveness span gets slow.

This was originally suspected as the cause of cockroachdb#16565 (and was proposed
in cockroachdb#16564). That issue turned out to have other causes, but this
change seems like a good idea anyway.
bdarnell added a commit to bdarnell/cockroach that referenced this pull request Jul 10, 2017
There's no reason to block our own liveness updates when incrementing
another node's epoch; doing so could cause cascading failures when
the liveness span gets slow.

This was originally suspected as the cause of cockroachdb#16565 (and was proposed
in cockroachdb#16564). That issue turned out to have other causes, but this
change seems like a good idea anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants