Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ephemeral nodes no longer correctly expire #1725

Closed
dustinblackman opened this issue Feb 9, 2024 · 3 comments
Closed

Ephemeral nodes no longer correctly expire #1725

dustinblackman opened this issue Feb 9, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@dustinblackman
Copy link

dustinblackman commented Feb 9, 2024

Bug description

Related to #1701

#1701 introduced a bug where ephemeral nodes no longer correctly expire. Once a node reaches it's intended expiry, it's correctly marked as expired in the TUI, but the node remains in the node list where the expired at timestamp continues to increase every 5 seconds. It continues to allocate an IP for itself even though no inbound connections are accepted.

The following screenshot shows 3 ephemeral nodes that have all expired. Pay close attention to the timestamp in the top left corner where I run the command every ~5 seconds, and the expired_at continues to increase. The theory is an event isn't being correctly fired/consumed since the transactions DB changes.

Screenshot 2024-02-08 at 9 56 18 PM

I'm hoping to continue debugging to see if I can find solution in the upcoming week, but would love an assist as I'm still unfamiliar with the code base.

Environment

  • Version of headscale used: 83769ba
  • Version of tailscale client: v1.58.2
  • OS (e.g. Linux, Mac, Cygwin, WSL, etc.): Debian bookworm

To Reproduce

The following branch I have contains a docker-compose file and several scripts that spawn a full headscale cluster locally, adding 2 normal nodes, 3 ephemeral nodes, and a nginx server to test connections. The headscale container has live reload that monitors changes on your local file system, making development and testing very quick.

Due to #1711, nodes authorized by auth keys have no expiry set, so the branch also includes a patch to expire ephemeral nodes after 30 seconds to assist with debugging.

Run the following to spawn the cluster locally.

git clone https://github.com/dustinblackman/headscale.git
cd headscale
git checkout ephemeral-debug
docker compose -f local-cluster.docker-compose.yml up

In a separate terminal once Headscale is running, you can enter the Headscale container with the following to access the CLI.

docker compose -f local-cluster.docker-compose.yml exec headscale /bin/bash 
headscale node list

Additionally, you can enter any of the nodes and run tailscale commands to check connections

docker ps | grep node
# Select a container ID from the above list
docker exec -it CONTAINTER-ID /bin/bash
tailscale status

To reset the environment and start over, while keeping the go build cache

docker compose -f local-cluster.docker-compose.yml kill
docker compose -f local-cluster.docker-compose.yml rm
docker volume ls | grep headscale | grep -v go | awk '{print $2}' | xargs -L1 docker volume rm
docker compose -f local-cluster.docker-compose.yml up

Side note: The docker-compose cluster setup is what I consider the magic sauce to debugging systems like Headscale locally in a production-like environment. If you find this useful, I'd be happy to open up a PR with docs for future developers. :)

@dustinblackman dustinblackman added the bug Something isn't working label Feb 9, 2024
@dustinblackman
Copy link
Author

cc @kradalby

@dustinblackman
Copy link
Author

dustinblackman commented Feb 9, 2024

Going to close this. I think I misunderstood the feature set around ephemeral nodes, and made me consider this a bug. Once ephemeral nodes disconnect, they disappear as expected. Forcing them off the network is a manual intervention, which makes sense.

Sorry for the noise!

@kradalby
Copy link
Collaborator

kradalby commented Feb 9, 2024

@dustinblackman no problem, I didnt get around to look into this today, but I wanted to say that I really appreciate the detailed report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants