Workflow and testing improvements #328

mprasil · 2023-05-03T14:13:34Z

This is somewhat opinionated PR with the intention to improve the role workflow a bit, make it a bit more readable and less fragile. I'm very open to discuss individual changes. I have split it into two commits that are effectively unrelated, it's just I encountered the test bug while working on the rest.

`05407c2`

Adds extra steps to update hosts file in test container. Tailscale does attempt to manage resolv.conf which is somewhat in conflict with Docker attempting to provice container name resolution. This is problematic in multi-stage scenarios where tailscale is initialized multiple times and eventually ends up in a state where container name resolution no longer works and test fails.

This extra steps add static hosts override for headscale container so that Tailscale managing DNS resolution does not mess up the connectivity to Headscale container.

`f716fc2`

The main changes are:

Switch to parsing json output of tailscale status - this should hopefully lead to less fragile detection of current status of tailscale connection and should also end up with more readable conditional logic in the role.
Split tailscale up step into two separate steps depending on situation:
- If state didn't change and network is down, simple tailscale up without any arguments is used as the connection should already be properly set and this is all that is needed to bring the connection up.
- If there is state change, full tailscale up with all the arguments is executed.
This can enable use cases where we don't necessarily want to fully manage the tailscale connection via Ansible, but the user still wants to bring the connection up. (ie. to prepare the instance for subsequent roles that rely on that connections)
- Dropped some failed_when and changed_when conditions and instead rely on default behavior, that assumes non-zero return is failure and that any time command is executed (except tailscale status) it is performing a change. This probably reflects reality better.

tasks/install.yml

mprasil · 2023-05-03T14:39:06Z

handlers/main.yml

-      - handlers_tailscale_status.stdout | length != 0
-      - handlers_tailscale_status.stdout is not match('\[L\+V9o\]')
+      - tailscale_is_online


I'm actually not sure what exactly this condition was trying to assert, but I think the new condition reflects better the name of the task.

Tailscale's output had some mysterious behavior in the past. I think --json solves for it, though
#43 (comment)

😄 When I saw that regexp, I was sure there was some amount of debugging time spent right there.

artis3n · 2023-05-04T13:35:38Z

Love the idea of working on the JSON instead. As for

Dropped some failed_when and changed_when conditions and instead rely on default behavior, that assumes non-zero return is failure

I remember having to dig through tailscale source to find out how exit codes were produced by the binary, and the resulting reason led to this setup. It's been a couple years since then, though. The binary has changed a lot and my cursory dive back into the code couldn't find the same location I remember. Worth rethinking how that works as well, but I'll want to test this for a little while before moving forward.

tasks/install.yml

artis3n · 2023-05-04T13:43:24Z

handlers/main.yml

-      - handlers_tailscale_status.stdout | length != 0
-      - handlers_tailscale_status.stdout is not match('\[L\+V9o\]')
+      - tailscale_is_online


Tailscale's output had some mysterious behavior in the past. I think --json solves for it, though
#43 (comment)

mprasil · 2023-05-04T19:56:29Z

I remember having to dig through tailscale source to find out how exit codes were produced by the binary, and the resulting reason led to this setup.

If that makes things a bit easier, the dropped failed_when is in tailscale status task and it effectively changed from:

  failed_when:
    - tailscale_status.rc != 0
    - "'Logged out.' not in tailscale_status.stdout"
    - "'not logged in' not in tailscale_status.stdout"

to default behavior:

  failed_when:
    - tailscale_status.rc != 0

In this case the other two conditions were likely to account for unexpected output. In this new implementation we would notice that when parsing json.

The changed_when is in tailscale up task. The changed_when: tailscale_start.rc != 0 setting was irrelevant anyways as it would fail the playbook. And it probably didn't reflect reality, because failure to run tailscale up might have changed configuration - for example if we attempted to change connection that was up with invalid auth key. This would effectively drop the connection, which is a change. It's kind of edge case, but IMO if we can't be sure whether any change was done or not, we should err on the side of reporting change.

mprasil · 2023-05-06T11:12:09Z

I have now dropped the separate tailscale up task with no arguments. So we're back to running full tailscale up with arguments when state changes or connection is not up. Thanks for the feedback.

molecule/default/prepare.yml

tasks/install.yml

This patch redacts the authkey from `tailscale up` stderr output based on `tailscale_authkey` directly instead of the `tskey.*` regex. The same change was already applied for the stdout logging in artis3n#328. diff --git a/tasks/install.yml b/tasks/install.yml index 4719173..a9db923 100644 --- a/tasks/install.yml +++ b/tasks/install.yml @@ -167,6 +167,6 @@ - name: Install | Report redacted failure from "tailscale up" # noqa: no-handler ansible.builtin.fail: - msg: "{{ tailscale_start.stderr | default () | regex_replace('tskey.*\\s', 'REDACTED ') | regex_replace('\\t', '') | split('\n') }}" + msg: "{{ tailscale_start.stderr | default () | regex_replace(tailscale_authkey, 'REDACTED') | regex_replace('\\t', '') | split('\n') }}" when: - tailscale_start is failed

This patch redacts the authkey from `tailscale up` stderr output based on `tailscale_authkey` directly instead of the `tskey.*` regex. The same change was already applied for the stdout logging in artis3n#328.

This patch redacts the authkey from `tailscale up` stderr output based on `tailscale_authkey` directly instead of the `tskey.*` regex. The same change was already applied for the stdout logging in #328.

This patch redacts the authkey from `tailscale up` stderr output based on `tailscale_authkey` directly instead of the `tskey.*` regex. The same change was already applied for the stdout logging in artis3n#328.

mprasil commented May 3, 2023

View reviewed changes

tasks/install.yml Show resolved Hide resolved

mprasil commented May 3, 2023

View reviewed changes

tasks/install.yml Show resolved Hide resolved

mprasil commented May 3, 2023

View reviewed changes

tasks/install.yml Outdated Show resolved Hide resolved

mprasil commented May 3, 2023

View reviewed changes

artis3n reviewed May 4, 2023

View reviewed changes

mprasil force-pushed the workflow-improvements branch from f716fc2 to 81d1755 Compare May 6, 2023 11:08

artis3n reviewed May 6, 2023

View reviewed changes

molecule/default/prepare.yml Show resolved Hide resolved

tasks/install.yml Show resolved Hide resolved

tasks/install.yml Outdated Show resolved Hide resolved

tasks/install.yml Outdated Show resolved Hide resolved

tasks/install.yml Outdated Show resolved Hide resolved

artis3n force-pushed the workflow-improvements branch from 81d1755 to 29c9119 Compare May 6, 2023 13:29

artis3n temporarily deployed to E2E May 6, 2023 13:29 — with GitHub Actions Inactive

artis3n temporarily deployed to E2E May 19, 2023 18:46 — with GitHub Actions Inactive

artis3n approved these changes May 19, 2023

View reviewed changes

artis3n merged commit 6016882 into artis3n:main May 19, 2023

mprasil deleted the workflow-improvements branch May 19, 2023 20:21

jonvmey mentioned this pull request May 31, 2023

Remove extra underscore from variable name #344

Merged

relrod mentioned this pull request Jun 20, 2023

[BUG] Check mode fails to run (JSONDecodeError) #348

Open

fredrikekre mentioned this pull request Mar 26, 2024

Fix redacting the auth key when tailscale up fails #456

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow and testing improvements #328

Workflow and testing improvements #328

mprasil commented May 3, 2023

mprasil May 3, 2023

artis3n May 4, 2023

mprasil May 4, 2023

artis3n commented May 4, 2023

artis3n May 4, 2023

mprasil commented May 4, 2023

mprasil commented May 6, 2023

Workflow and testing improvements #328

Workflow and testing improvements #328

Conversation

mprasil commented May 3, 2023

05407c2

f716fc2

mprasil May 3, 2023

Choose a reason for hiding this comment

artis3n May 4, 2023

Choose a reason for hiding this comment

mprasil May 4, 2023

Choose a reason for hiding this comment

artis3n commented May 4, 2023

artis3n May 4, 2023

Choose a reason for hiding this comment

mprasil commented May 4, 2023

mprasil commented May 6, 2023

`05407c2`

`f716fc2`