Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace database locks with transactions #1701

Merged
merged 1 commit into from
Feb 8, 2024

Conversation

kradalby
Copy link
Collaborator

This commits removes the locks used to guard data integrity for the
database and replaces them with Transactions, turns out that SQL had
a way to deal with this all along.

This reduces the complexity we had with multiple locks that might stack
or recurse (database, nofitifer, mapper). All notifications and state
updates are now triggered after a database change.

Signed-off-by: Kristoffer Dalby [email protected]

Copy link

@dustinblackman dustinblackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reviewing the PR in preparation for #1572 (comment), I put down comments I had incase it was useful.

@@ -337,7 +359,7 @@ func (h *Headscale) handlePoll(
// to receive a message to make sure we dont block the entire
// notifier.
// 12 is arbitrarily chosen.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍰 120 now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wops, fixed

ChangeNodes: types.Nodes{node},
}
if stateSelfUpdate.Valid() {
hsdb.notifier.NotifyByMachineKey(stateSelfUpdate, node.MachineKey)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ This logic got moved out of the DB layer and up to the API level, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, because both the notifier and db has locks (or lock like things like transactions) which became easy to deadlock and hard to detect.

@dustinblackman
Copy link

dustinblackman commented Feb 8, 2024

@kradalby I had this PR rebased on top of main running for a few hours, and did not have it lock up on me once! I had two other issues though that I assume are related to changes here.

  1. New nodes joining the network for the first time are unable to make requests over the network. tailscale ping IP-ADDRESS works as expected, but attempting to hit a service on port 80 across the network doesn't work.
  2. Ephemeral nodes don't seem to expire properly. I have a node that is marked online (when it's not, it had been offline for +1 hour), expired is "yes", and the expired_at field continuously increments by 5 seconds every 5 seconds, so the node never goes away from the list in headscale nodes list, nor from tailscale status.

@kradalby kradalby force-pushed the db-remove-lock-use-tx branch from a757f44 to 05834aa Compare February 8, 2024 15:22
This commits removes the locks used to guard data integrity for the
database and replaces them with Transactions, turns out that SQL had
a way to deal with this all along.

This reduces the complexity we had with multiple locks that might stack
or recurse (database, nofitifer, mapper). All notifications and state
updates are now triggered _after_ a database change.

Signed-off-by: Kristoffer Dalby <[email protected]>

wops

Signed-off-by: Kristoffer Dalby <[email protected]>
@kradalby kradalby force-pushed the db-remove-lock-use-tx branch from 05834aa to 0d773b0 Compare February 8, 2024 15:26
@kradalby
Copy link
Collaborator Author

kradalby commented Feb 8, 2024

@dustinblackman

Thanks!

Could you file them as two new issues and tag this PR? It is so big already that I do not really want to dump more stuff onto it, but of course that needs to be addressed.

I think the ephemeral tests need to be improved and we need to find a way to replicate it as part of fixing it so we ensure there is no regression in the future.

For the port 80, I think the curling in the ACL tests would cover that, so it is strange that they are passing 🤔

@kradalby kradalby merged commit 83769ba into juanfont:main Feb 8, 2024
50 checks passed
@kradalby kradalby deleted the db-remove-lock-use-tx branch February 8, 2024 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants