[Nix] adjust pytest retrys #4558

dougch · 2024-05-15T19:21:59Z

Resolved issues:

part of #3841

Description of changes:

Several tests were failing under nix on arm (e.g. happy_path), but upon investigation, the pytest arguments for nix were less forgiving than our other ci jobs.

This PR aligns the default pytest retry count, from 0 to 2 globally.

These changes address the following test failures:

The following tests FAILED:
        271 - integrationv2_happy_path (Failed)
        275 - integrationv2_ocsp (Failed)
        277 - integrationv2_record_padding (Failed)
        278 - integrationv2_renegotiate (Failed)
        279 - integrationv2_session_resumption (Failed)
        280 - integrationv2_signature_algorithms (Failed)
        282 - integrationv2_version_negotiation (Failed)

An ad-hoc job running just the above on both architectures is here

Call-outs:

Unfortunately, a few tests need more retries than just 2 or a pause between retries. Iterating through these failures, I've bumped up the retry count/delay using pytest rerunfailures flaky decorator, but only on specific tests while running on arm.

For renegotiate, I also observed many of the retries were from us hitting the sub-process timeout of 5 seconds. Experimenting with huge timeouts led to landing on 8 seconds as a more reliable timeout.

Testing:

How is this change tested (unit tests, fuzz tests, etc.)? local/ci

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

tests/integrationv2/test_happy_path.py

tests/integrationv2/test_record_padding.py

tests/integrationv2/test_renegotiate.py

goatgoose · 2024-05-23T14:46:29Z

tests/integrationv2/test_session_resumption.py

@@ -243,6 +244,7 @@ def test_tls13_session_resumption_s2n_client(managed_process, cipher, curve, cer
                b'SSL_accept:SSLv3/TLS write certificate') == num_full_connections


+@pytest.mark.flaky(reruns=7, reruns_delay=2, condition=platform.machine().startswith("aarch"))


What are the errors when these tests fail due to flakiness? It seems like the tests could be broken if they require 5 or 7 retries in order to succeed?

Current working theory is that this is the test framework/interactions with subprocesses. I bumped the timeout values from 5 seconds to 60, and then noticed that the retried tests count exactly matches the number of tests running for 60 seconds. So these are being timed out by pytest, unclear why.

Interesting ok. I guess the concern is that if we make a change that causes one of these tests to become flaky, we likely won't notice it since they're retrying so many times. But I guess the extra retries only apply to arm, so as long as we continue running them in an environment with retries disabled we should notice a flaky regression.

To be fair, we're running these now on x86 with 2 retires, so there is no CI running these specific integrationv2 tests with 0 retires. I don't believe we're collecting retry metrics, which would be an interesting datapoint...

CMakeLists.txt

Nix pytest retrys should match the other CI jobs.

468309e

github-actions bot added the s2n-core team label May 15, 2024

dougch added 2 commits May 17, 2024 09:24

Merge branch 'main' into nix_retries

9b47f91

Merge branch 'main' into nix_retries

739c7ab

dougch force-pushed the nix_retries branch 5 times, most recently from 5ee7ea9 to dc33e28 Compare May 21, 2024 19:48

Additional flaky tests

b93cbc1

dougch changed the title ~~Nix pytest retrys should match the other CI job~~ [Nix] adjust pytest retrys May 21, 2024

dougch force-pushed the nix_retries branch from dc33e28 to b93cbc1 Compare May 21, 2024 22:37

Merge branch 'main' into nix_retries

6e90e20

github-advanced-security bot found potential problems May 21, 2024

View reviewed changes

tests/integrationv2/test_happy_path.py Fixed Show fixed Hide fixed

tests/integrationv2/test_record_padding.py Fixed Show fixed Hide fixed

tests/integrationv2/test_renegotiate.py Fixed Show fixed Hide fixed

dougch added 3 commits May 21, 2024 22:47

fix decorator args

19f7e68

pep8 fix

42fb1a0

additional flaky test

2ec1fa8

dougch force-pushed the nix_retries branch from 3ed8f69 to 0e45dcd Compare May 22, 2024 18:56

Increase the timeout for renegotiate test

ace407c

dougch force-pushed the nix_retries branch from a214f8e to ace407c Compare May 22, 2024 20:54

dougch marked this pull request as ready for review May 22, 2024 21:39

dougch requested review from jmayclin and goatgoose May 22, 2024 21:39

goatgoose reviewed May 23, 2024

View reviewed changes

dougch requested a review from goatgoose May 23, 2024 19:20

goatgoose approved these changes May 23, 2024

View reviewed changes

CMakeLists.txt Show resolved Hide resolved

dougch mentioned this pull request May 24, 2024

V2 Integration isssue on Arm/nix #4568

Open

jmayclin approved these changes May 29, 2024

View reviewed changes

Merge branch 'main' into nix_retries

22281d8

dougch enabled auto-merge (squash) May 29, 2024 18:14

dougch added the type/nix related to nix label May 29, 2024

dougch merged commit 622bcd3 into aws:main May 29, 2024
33 checks passed

dougch mentioned this pull request Oct 31, 2024

chore: broaden use of flaky mark #4865

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Nix] adjust pytest retrys #4558

[Nix] adjust pytest retrys #4558

dougch commented May 15, 2024 •

edited

Loading

goatgoose May 23, 2024

dougch May 23, 2024

goatgoose May 23, 2024

dougch May 24, 2024

		@@ -243,6 +244,7 @@ def test_tls13_session_resumption_s2n_client(managed_process, cipher, curve, cer
		b'SSL_accept:SSLv3/TLS write certificate') == num_full_connections


		@pytest.mark.flaky(reruns=7, reruns_delay=2, condition=platform.machine().startswith("aarch"))

[Nix] adjust pytest retrys #4558

[Nix] adjust pytest retrys #4558

Conversation

dougch commented May 15, 2024 • edited Loading

Resolved issues:

Description of changes:

Call-outs:

Testing:

goatgoose May 23, 2024

Choose a reason for hiding this comment

dougch May 23, 2024

Choose a reason for hiding this comment

goatgoose May 23, 2024

Choose a reason for hiding this comment

dougch May 24, 2024

Choose a reason for hiding this comment

dougch commented May 15, 2024 •

edited

Loading