Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of flaky tests #2978

Closed
lukesteensen opened this issue Jul 8, 2020 · 20 comments
Closed

Get rid of flaky tests #2978

lukesteensen opened this issue Jul 8, 2020 · 20 comments
Assignees
Labels
domain: tests Anything related to Vector's internal tests type: bug A code related bug. type: tech debt A code change that does not add user value.

Comments

@lukesteensen
Copy link
Member

We have a number of flaky tests that are imposing a heavy cost on our dev and CI process. We should have a strategy to eliminate them. I propose the following:

  1. Identify the tests in question
  2. For each test, open an issue describing its purpose and then delete the test entirely, linking the deletion commit in the issue
  3. Triage the issues by cost/benefit to having the test based on the described purpose
  4. Rewrite tests that are worthwhile

The tests currently provide negative value, so deleting them gets us back to a clean state as quickly as possible. We can then add them back (written in a more reliable fashion) as deemed valuable. If they're not valuable enough to rewrite, that's fine too.

@lukesteensen lukesteensen added the type: tech debt A code change that does not add user value. label Jul 8, 2020
@binarylogic
Copy link
Contributor

@lukesteensen @Hoverbear @fanatid @bruceg @ktff, could you list any tests that you are aware of and we can get to work on this? I'd like to compile a concrete list we can work against.

@Hoverbear

This comment has been minimized.

@ktff

This comment has been minimized.

@ktff
Copy link
Contributor

ktff commented Jul 10, 2020

@Hoverbear where did you see encodes_histogram_without_timestamp failing? It should only be failing in #2913 like that, which isn't merged.

@ktff
Copy link
Contributor

ktff commented Jul 10, 2020

topology::config::watcher::tests::multi_file_update on mac
https://github.com/timberio/vector/pull/3010/checks?check_run_id=857600333

@Hoverbear
Copy link
Contributor

@ktff You're right! Sorry!

@Hoverbear
Copy link
Contributor

#2978 (comment) is #3000

@ktff
Copy link
Contributor

ktff commented Jul 11, 2020

topology::reload_tests::topology_reuse_old_port on windows
https://github.com/timberio/vector/pull/3010/checks?check_run_id=857742471

@ktff
Copy link
Contributor

ktff commented Jul 11, 2020

#3017 (comment)

test_max_size_resume
test_max_size_resume.log

I've seen this one before I think

@Hoverbear not suprising, they were addressed in #2862, but the race to the file mentioned in comment is quite persistent. (EDIT: It's something else)

@ktff
Copy link
Contributor

ktff commented Jul 11, 2020

@ktff
Copy link
Contributor

ktff commented Jul 11, 2020

test_udp_syslog
https://github.com/timberio/vector/pull/2955/checks?check_run_id=861560287

The PR changes syslog but not the udp part, and the test passes locally.

@ktff
Copy link
Contributor

ktff commented Jul 19, 2020

sinks::aws_s3::integration_tests::s3_waits_for_full_batch_or_timeout_before_sending

https://github.com/timberio/vector/pull/3099/checks?check_run_id=886641917
https://github.com/timberio/vector/runs/868843252

@ktff
Copy link
Contributor

ktff commented Jul 19, 2020

sources::socket::test::tcp_gracefull_shutdown

https://github.com/timberio/vector/runs/874679387

Seems to be failing because of OS errors, probably firewall.

@binarylogic
Copy link
Contributor

@ktff thanks for working through these. How are we feeling about closing this issue given the above? Are there tests left that we need to remove?

@ktff
Copy link
Contributor

ktff commented Jul 19, 2020

Are there tests left that we need to remove?

Based on recent CI runs, these are the only flaky tests, so no.

It should be fine to close it. If another one popes up, it can be dealt with in a regular way.

@ktff ktff closed this as completed Jul 19, 2020
@binarylogic
Copy link
Contributor

Sounds good 👍

@ktff

This comment has been minimized.

@ktff ktff reopened this Aug 1, 2020
@ktff ktff closed this as completed Aug 2, 2020
@ktff
Copy link
Contributor

ktff commented Aug 13, 2020

@JeanMertz
Copy link
Contributor

Re-opening, so that we can keep using this issue to track all flaky tests. We'll still create separate issues for each failing test, but link to this one so that you can easily cmd+F this issue to see if a test is already reported as being flaky (the GH search function doesn't always work as well as you want it to, unfortunately).

@JeanMertz JeanMertz reopened this Sep 9, 2020
@ktff ktff removed their assignment Sep 9, 2020
@JeanMertz JeanMertz added the domain: tests Anything related to Vector's internal tests label Sep 11, 2020
@jamtur01 jamtur01 added the type: bug A code related bug. label Sep 11, 2020
@jamtur01 jamtur01 self-assigned this Sep 11, 2020
@jamtur01 jamtur01 added this to the 2020-09-14 - The Grid milestone Sep 11, 2020
@jamtur01 jamtur01 removed this from the 2020-09-14 - The Grid milestone Sep 24, 2020
@jamtur01 jamtur01 added this to the 2020-09-28 - Derezzed milestone Sep 25, 2020
@binarylogic
Copy link
Contributor

Closing this since the issue is not particularly helpful anymore. Please continue to open individual issues for each test removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: tests Anything related to Vector's internal tests type: bug A code related bug. type: tech debt A code change that does not add user value.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants