Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests(iroh-net): Make derp connect loop test more reliable #2064

Merged
merged 2 commits into from
Mar 8, 2024

Conversation

flub
Copy link
Contributor

@flub flub commented Mar 8, 2024

Description

This refactors this test which has had reports of being flaky a bit.
It tiedies up some logging but most significantly:

  • The server task is now aborted properly regardless of how the test
    execution works out.

  • The clients are no longer moved to a task. The server is
    handling them sequentially anyway so there's not much to gain from
    having one of the clients sit around waiting to be accepted.

  • Most importantly, and possibly the relevant bugfix, both clients
    were using the same NodeId. This may just about work out if both
    are not connecting at the same time as it would maybe look like a
    migrating client to the server. But combined with the clients
    before being spawned in tasks this would have confused the server's
    PathState quite a lot. I don't believe these clients should have
    shared their NodeId.

  • A (generous) timeout is added to the test. The flake issue mentions
    that this is sometimes just very slow. There really is no reason
    for this to be slow so consider these failures so that we get
    debugging info from them in the form of logs.

Notes & open questions

Closes #2050
At least, it might. I guess we'll reopen it if it doesn't.

Change checklist

  • Self-review.
  • Tests if relevant.

This refactors this test which has had reports of being flaky a bit.
It tiedies up some logging but most significantly:

- The server task is now aborted properly regardless of how the test
  execution works out.

- The clients are no longer moved to a task.  The server is
  handling them sequentially anyway so there's not much to gain from
  having one of the clients sit around waiting to be accepted.

- Most importantly, and possibly the relevant bugfix, both clients
  were using the same NodeId.  This *may* just about work out if both
  are not connecting at the same time as it would maybe look like a
  migrating client to the server.  But combined with the clients
  before being spawned in tasks this would have confused the server's
  PathState quite a lot.  I don't believe these clients should have
  shared their NodeId.

- A (generous) timeout is added to the test.  The flake issue mentions
  that this is sometimes just very slow.  There really is no reason
  for this to be slow so consider these failures so that we get
  debugging info from them in the form of logs.
@flub flub changed the title tests(iroh-net): Make test more reliable tests(iroh-net): Make derp connect loop test more reliable Mar 8, 2024
@flub flub requested review from Frando and dignifiedquire March 8, 2024 11:53
Copy link
Contributor

@dignifiedquire dignifiedquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good, lets hope it helps

@flub flub enabled auto-merge March 8, 2024 13:12
@flub flub added this pull request to the merge queue Mar 8, 2024
Merged via the queue into main with commit 9e7605d Mar 8, 2024
20 checks passed
@flub flub deleted the flub/magicep-fewer-flakes branch March 8, 2024 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

flaky: iroh_net::magic_endpoint::tests::magic_endpoint_derp_connect_loop
2 participants