webdriver/tests/ are flaky in Firefox and Safari #28925

foolip · 2021-05-10T09:23:04Z

In many recent PRs touching webdriver/tests/ I've seen that the wpt.fyi checks show many differences in test results for Firefox and Safari, with the largest differences typically for Safari. Example:
https://github.com/web-platform-tests/wpt/pull/28875/checks?check_run_id=2522115464
https://github.com/web-platform-tests/wpt/pull/28875/checks?check_run_id=2521309324

This makes it difficult to make changes to WebDriver tests with confidence, because it always looks like there are some regressions. I've had to result to comparing two sets of results with manually constructed wpt.fyi URLs and filters to convince myself in a few occasions:
#28757 (comment)
#28789 (comment)

@burg @gsnedders @jgraham @whimboo is this something you observe in your own CI as well? If it could be made more consistent here in WPT's CI, the risk of regressing the tests accidentally would go down.

gsnedders · 2021-05-10T09:31:16Z

The flakiness with the user prompts in safaridriver is known (rdar://54401037 for anyone at Apple who comes across this)

foolip · 2021-05-10T10:15:12Z

@gsnedders if it is all caused by user prompts, do you think there's any hacky hack that could be used to make the tests more stable in the meantime?

gsnedders · 2021-05-10T13:11:34Z

@gsnedders if it is all caused by user prompts, do you think there's any hacky hack that could be used to make the tests more stable in the meantime?

No idea.

foolip · 2021-05-10T15:15:05Z

OK 😄

whimboo · 2021-05-11T08:52:12Z

@foolip are the failures always around changing the window size of the Firefox window? We have some known intermittent failures on Linux for that, but I wonder if using Ubuntu 20.04 makes it even worse. In our CI we still have 18.04 LTS.

foolip · 2021-05-11T09:09:56Z

@whimboo it could be, but the user_prompts.py part of the test names is what's drawn my attention. I've also noticed that sometimes the failure of a test jumps to another test, as would happen if there's something async going on that will fail the current test, whatever it is.

From https://wpt.fyi/insights you can generate views that are helpful for this:
https://wpt.fyi/results/webdriver/tests?label=master&label=experimental&max-count=10&product=firefox&q=seq%28%28status%3APASS%7Cstatus%3AOK%29%20%28status%3A%21PASS%26status%3A%21OK%26status%3A%21unknown%29%29%20seq%28%28status%3A%21PASS%26status%3A%21OK%26status%3A%21unknown%29%20%28status%3APASS%7Cstatus%3AOK%29%29

It does look like it's always those 3 tests...

whimboo · 2021-05-11T09:33:12Z

Thanks for that link. So from these failures and when I enable details I can only see an AssertionError in most of the cases. Sadly that doesn't give all the details. I assume all tests aren't run with trace logging enabled? For geckodriver and Marionette this can be done via --webdriver-arg=-vv.

What changed recently in Firefox is the new type of content modal dialogs, which are in use by alert, confirm, or prompt so that it would match. And I landed support for these dialogs on April 29th. Since then I fixed some intermittent failures but those shouldn't have affected the WebDriver tests. So is there a way to check if the pass/fail rate got worse around that date?

foolip · 2021-05-11T09:56:12Z

If you start at https://wpt.fyi/runs and start scrolling down you can find older runs, but that's a bit tedious. This isn't possible via the UI (I think) but if you add to=2021-04-29 you get runs before that date:
https://wpt.fyi/results/webdriver/tests?label=experimental&label=master&max-count=10&to=2021-04-29T00%3A00%3A00.000Z&product=firefox&q=seq%28%28status%3APASS%7Cstatus%3AOK%29%20%28status%3A%21PASS%26status%3A%21OK%26status%3A%21unknown%29%29%20seq%28%28status%3A%21PASS%26status%3A%21OK%26status%3A%21unknown%29%20%28status%3APASS%7Cstatus%3AOK%29%29

It looks like the tests were flaky then as well. Going back a year more in time it's the same.

Have you checked if these are stable in Gecko CI? If yes, and if these tests are perfectly reliable locally, then you could try making logging more verbose by tweaking here:

wpt/tools/ci/taskcluster-run.py

Lines 75 to 84 in e545686

    
           wpt_args += [ 
        
               "--log-mach-level=info", 
        
               "--log-mach=-", 
        
               "-y", 
        
               "--no-pause", 
        
               "--no-restart-on-unexpected", 
        
               "--install-fonts", 
        
               "--no-headless", 
        
               "--verify-log-full" 
        
           ]

whimboo · 2021-05-11T10:05:12Z

Oh, I can actually see a lot of multiple statuses set for these tests:

https://searchfox.org/mozilla-central/source/testing/web-platform/meta/webdriver/tests/maximize_window/user_prompts.py.ini

That might explain why I haven't seen any failures in our CI. Sadly we won't have the time to dig further into this anytime soon. :/ But it's good to see it's not related to the new kind of modals.

whimboo · 2023-08-21T08:13:22Z

Quite a bit of time has been passed by and we improved the tests and our WebDriver classic implementation a lot since then. I would suggest that we close this issue and if necessary file specific issues for flakiness as seen.

foolip added webdriver flaky labels May 10, 2021

foolip mentioned this issue May 10, 2021

Remove use of six.reraise #28887

Merged

foolip mentioned this issue May 10, 2021

Pass server config to WebDriver via a file instead of an env variable. #28834

Merged

whimboo closed this as completed Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webdriver/tests/ are flaky in Firefox and Safari #28925

webdriver/tests/ are flaky in Firefox and Safari #28925

foolip commented May 10, 2021

gsnedders commented May 10, 2021

foolip commented May 10, 2021

gsnedders commented May 10, 2021

foolip commented May 10, 2021

whimboo commented May 11, 2021

foolip commented May 11, 2021

whimboo commented May 11, 2021

foolip commented May 11, 2021

whimboo commented May 11, 2021

whimboo commented Aug 21, 2023

webdriver/tests/ are flaky in Firefox and Safari #28925

webdriver/tests/ are flaky in Firefox and Safari #28925

Comments

foolip commented May 10, 2021

gsnedders commented May 10, 2021

foolip commented May 10, 2021

gsnedders commented May 10, 2021

foolip commented May 10, 2021

whimboo commented May 11, 2021

foolip commented May 11, 2021

whimboo commented May 11, 2021

foolip commented May 11, 2021

whimboo commented May 11, 2021

whimboo commented Aug 21, 2023