Get rid of flaky tests #2978

lukesteensen · 2020-07-08T21:27:38Z

We have a number of flaky tests that are imposing a heavy cost on our dev and CI process. We should have a strategy to eliminate them. I propose the following:

Identify the tests in question
For each test, open an issue describing its purpose and then delete the test entirely, linking the deletion commit in the issue
Triage the issues by cost/benefit to having the test based on the described purpose
Rewrite tests that are worthwhile

The tests currently provide negative value, so deleting them gets us back to a clean state as quickly as possible. We can then add them back (written in a more reliable fashion) as deemed valuable. If they're not valuable enough to rewrite, that's fine too.

binarylogic · 2020-07-08T21:31:16Z

@lukesteensen @Hoverbear @fanatid @bruceg @ktff, could you list any tests that you are aware of and we can get to work on this? I'd like to compile a concrete list we can work against.

ktff · 2020-07-10T11:12:41Z

@Hoverbear where did you see encodes_histogram_without_timestamp failing? It should only be failing in #2913 like that, which isn't merged.

ktff · 2020-07-10T11:24:10Z

topology::config::watcher::tests::multi_file_update on mac
https://github.com/timberio/vector/pull/3010/checks?check_run_id=857600333

Hoverbear · 2020-07-10T16:57:42Z

@ktff You're right! Sorry!

Hoverbear · 2020-07-10T16:58:16Z

#2978 (comment) is #3000

ktff · 2020-07-11T21:10:54Z

topology::reload_tests::topology_reuse_old_port on windows
https://github.com/timberio/vector/pull/3010/checks?check_run_id=857742471

ktff · 2020-07-11T21:32:03Z

#3017 (comment)

test_max_size_resume
test_max_size_resume.log

I've seen this one before I think

@Hoverbear not suprising, they were addressed in #2862, ~~but the race to the file mentioned in comment is quite persistent.~~ (EDIT: It's something else)

ktff · 2020-07-11T22:07:00Z

tests\tcp::merge on windows
https://github.com/timberio/vector/pull/3010/checks?check_run_id=861595079

ktff · 2020-07-11T22:11:34Z

test_udp_syslog
https://github.com/timberio/vector/pull/2955/checks?check_run_id=861560287

The PR changes syslog but not the udp part, and the test passes locally.

ktff · 2020-07-19T11:23:57Z

sinks::aws_s3::integration_tests::s3_waits_for_full_batch_or_timeout_before_sending

https://github.com/timberio/vector/pull/3099/checks?check_run_id=886641917
https://github.com/timberio/vector/runs/868843252

ktff · 2020-07-19T11:28:49Z

sources::socket::test::tcp_gracefull_shutdown

https://github.com/timberio/vector/runs/874679387

Seems to be failing because of OS errors, probably firewall.

binarylogic · 2020-07-19T21:50:24Z

@ktff thanks for working through these. How are we feeling about closing this issue given the above? Are there tests left that we need to remove?

ktff · 2020-07-19T22:15:03Z

Are there tests left that we need to remove?

Based on recent CI runs, these are the only flaky tests, so no.

It should be fine to close it. If another one popes up, it can be dealt with in a regular way.

binarylogic · 2020-07-19T22:24:20Z

Sounds good 👍

ktff · 2020-08-13T15:11:56Z

As per #3416 (comment)

test_reclaim_disk_space

https://github.com/timberio/vector/runs/973409685

JeanMertz · 2020-09-09T14:46:08Z

Re-opening, so that we can keep using this issue to track all flaky tests. We'll still create separate issues for each failing test, but link to this one so that you can easily cmd+F this issue to see if a test is already reported as being flaky (the GH search function doesn't always work as well as you want it to, unfortunately).

binarylogic · 2020-10-11T17:01:14Z

Closing this since the issue is not particularly helpful anymore. Please continue to open individual issues for each test removed.

lukesteensen added the type: tech debt A code change that does not add user value. label Jul 8, 2020

binarylogic assigned ktff Jul 8, 2020

This comment has been minimized.

Sign in to view

Hoverbear mentioned this issue Jul 10, 2020

chore(deps): update prost #3017

Merged

This was referenced Jul 19, 2020

Flaky tcp_gracefull_shutdown #3102

Closed

Flaky s3_waits_for_full_batch_or_timeout_before_sending #3104

Closed

ktff closed this as completed Jul 19, 2020

This comment has been minimized.

Sign in to view

ktff reopened this Aug 1, 2020

ktff mentioned this issue Aug 1, 2020

chore(tests): Fix race in CountReceiver #3308

Merged

1 task

ktff closed this as completed Aug 2, 2020

ktff mentioned this issue Aug 13, 2020

Flaky test_reclaim_disk_space #3442

Closed

Hoverbear mentioned this issue Aug 27, 2020

failing sinks::http::tests::http_happy_path_post test on Mac #3606

Closed

This was referenced Sep 9, 2020

Replace flaky sources::file::tests::remove_file test #3780

Closed

Flaky sinks::util::auto_concurrency::tests::defers_at_high_concurrency test #3781

Closed

JeanMertz reopened this Sep 9, 2020

ktff removed their assignment Sep 9, 2020

JeanMertz mentioned this issue Sep 11, 2020

Flaky sources::apache_metrics::test::test_apache_up test #3821

Closed

JeanMertz added the domain: tests Anything related to Vector's internal tests label Sep 11, 2020

jamtur01 added the type: bug A code related bug. label Sep 11, 2020

jamtur01 self-assigned this Sep 11, 2020

jamtur01 added this to the 2020-09-14 - The Grid milestone Sep 11, 2020

JeanMertz mentioned this issue Sep 15, 2020

Ensure CI is consistently green #3880

Closed

2 tasks

jszwedko mentioned this issue Sep 17, 2020

Flakey test on OSX: retry_until_after_timeout #4003

Closed

jamtur01 removed this from the 2020-09-14 - The Grid milestone Sep 24, 2020

jamtur01 assigned ktff Sep 25, 2020

jamtur01 added this to the 2020-09-28 - Derezzed milestone Sep 25, 2020

ktff mentioned this issue Oct 1, 2020

chore(tests): Disable flaky tests on MacOS #4251

Closed

binarylogic closed this as completed Oct 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of flaky tests #2978

Get rid of flaky tests #2978

lukesteensen commented Jul 8, 2020

binarylogic commented Jul 8, 2020

This comment has been minimized.

This comment has been minimized.

ktff commented Jul 10, 2020 •

edited

Loading

ktff commented Jul 10, 2020

Hoverbear commented Jul 10, 2020

Hoverbear commented Jul 10, 2020

ktff commented Jul 11, 2020

ktff commented Jul 11, 2020 •

edited

Loading

ktff commented Jul 11, 2020

ktff commented Jul 11, 2020

ktff commented Jul 19, 2020

ktff commented Jul 19, 2020

binarylogic commented Jul 19, 2020

ktff commented Jul 19, 2020

binarylogic commented Jul 19, 2020

This comment has been minimized.

ktff commented Aug 13, 2020

JeanMertz commented Sep 9, 2020

binarylogic commented Oct 11, 2020

Get rid of flaky tests #2978

Get rid of flaky tests #2978

Comments

lukesteensen commented Jul 8, 2020

binarylogic commented Jul 8, 2020

This comment has been minimized.

This comment has been minimized.

ktff commented Jul 10, 2020 • edited Loading

ktff commented Jul 10, 2020

Hoverbear commented Jul 10, 2020

Hoverbear commented Jul 10, 2020

ktff commented Jul 11, 2020

ktff commented Jul 11, 2020 • edited Loading

ktff commented Jul 11, 2020

ktff commented Jul 11, 2020

ktff commented Jul 19, 2020

ktff commented Jul 19, 2020

binarylogic commented Jul 19, 2020

ktff commented Jul 19, 2020

binarylogic commented Jul 19, 2020

This comment has been minimized.

ktff commented Aug 13, 2020

JeanMertz commented Sep 9, 2020

binarylogic commented Oct 11, 2020

ktff commented Jul 10, 2020 •

edited

Loading

ktff commented Jul 11, 2020 •

edited

Loading