Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run tests in parallel with pytest-xdist. #705

Merged
merged 9 commits into from
Mar 7, 2022
Merged

Run tests in parallel with pytest-xdist. #705

merged 9 commits into from
Mar 7, 2022

Conversation

bdice
Copy link
Member

@bdice bdice commented Mar 3, 2022

Description

This PR adds the plugin pytest-xdist to run tests in parallel. Each CircleCI machine has two cores. This cuts off around 45 seconds from each CI job, which adds up to a decent amount of time across all jobs. The output is a little more verbose.

Before:
image

After (ignore pypy, I pushed another commit before it finished):
image

Motivation and Context

Might as well use all of the CI resources we are given. Saves a bit of time and probably reduces the energy needed by the server since it can sleep sooner.

Types of Changes

  • Documentation update
  • Bug fix
  • New feature
  • Breaking change1

1The change breaks (or has the potential to break) existing functionality.

Checklist:

If necessary:

  • I have updated the API documentation as part of the package doc-strings.
  • I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
  • I have updated the changelog and added all related issue and pull request numbers for future reference (if applicable). See example below.

@bdice bdice self-assigned this Mar 3, 2022
@bdice bdice added the enhancement New feature or request label Mar 3, 2022
@bdice bdice added this to the v1.8.0 milestone Mar 3, 2022
@bdice bdice marked this pull request as ready for review March 3, 2022 00:21
@bdice bdice requested review from a team as code owners March 3, 2022 00:21
@bdice bdice requested review from cbkerr and tommy-waltmann and removed request for a team March 3, 2022 00:21
@codecov
Copy link

codecov bot commented Mar 3, 2022

Codecov Report

Merging #705 (e7edb89) into master (dfc28cb) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #705   +/-   ##
=======================================
  Coverage   78.54%   78.54%           
=======================================
  Files          65       65           
  Lines        7141     7141           
  Branches     1565     1565           
=======================================
  Hits         5609     5609           
  Misses       1228     1228           
  Partials      304      304           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dfc28cb...e7edb89. Read the comment docs.

@mikemhenry
Copy link
Collaborator

Maybe do -n auto so that if we ever have more cores/different CI we don't need to change the number?

@bdice
Copy link
Member Author

bdice commented Mar 3, 2022

Maybe do -n auto so that if we ever have more cores/different CI we don't need to change the number?

Unfortunately the CI machine claims to have 36 cores but only schedules 2. Auto ends up oversubscribing really badly.

@mikemhenry
Copy link
Collaborator

mikemhenry commented Mar 3, 2022

Maybe do -n auto so that if we ever have more cores/different CI we don't need to change the number?

Unfortunately the CI machine claims to have 36 cores but only schedules 2. Auto ends up oversubscribing really badly.

Yikes! Okay that makes sense!
(sorry I see you tried that here: e77fcdc)

@joaander
Copy link
Member

joaander commented Mar 3, 2022

Maybe do -n auto so that if we ever have more cores/different CI we don't need to change the number?

Unfortunately the CI machine claims to have 36 cores but only schedules 2. Auto ends up oversubscribing really badly.

Yikes! Okay that makes sense! (sorry I see you tried that here: e77fcdc)

As far as I know, this is specific to circleci. GitHub Actions and Azure pipelines use VMs that advertise the correct number of cores.

@vyasr
Copy link
Contributor

vyasr commented Mar 3, 2022

I think we can probably run with this change for now and revisit if we ever switch to a different CI executor, then?

@joaander
Copy link
Member

joaander commented Mar 4, 2022

I think we can probably run with this change for now and revisit if we ever switch to a different CI executor, then?

Yes. I wasn't suggesting action - just providing information for context.

Copy link
Member

@cbkerr cbkerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice speedup! Make the changelog entry more specific then this is good to go!

changelog.txt Outdated Show resolved Hide resolved
Co-authored-by: Corwin Kerr <[email protected]>
@bdice bdice enabled auto-merge (squash) March 4, 2022 16:00
Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks a lot.

Do you know what is going on with the failing py38 test? Looks like some multi-threaded tests are failing?

@bdice bdice disabled auto-merge March 4, 2022 16:52
@vyasr
Copy link
Contributor

vyasr commented Mar 4, 2022

Great! Thanks a lot.

Do you know what is going on with the failing py38 test? Looks like some multi-threaded tests are failing?

It looks like one of the errors is that this line in the test should be modified to catch a KeyError. The existing except clause isn't broad enough in detecting the ways in which multithreaded execution can fail when multithreading support is disabled.

My best guess is that the other failure is due to #397, specifically the case that I decided was out of scope when closing that issue: process-parallelism safe buffering. I suspect that we only see failures on PyPy because it executes much faster than CPython. I observed the same type of issue many times when constructing this test, specifically because sequences of operations that could fail when multithreading support was disabled would succeed in PyPy because threads completed too quickly to be preempted and so data was never corrupted even without acquiring mutexes.

Buffering safe for multiple processes is still out of scope (especially in conjunction with PyPy), so I don't think diagnosing the exact failure here is worth much further investigation. My recommendation is to check if we're using pytest-xdist (either using the PYTEST_XDIST_WORKER environment variable or another mechanism described in the pytest-xdist docs) and whether we're running in PyPy (the platform check is already being done for other things in this file), and if both of those conditions are true just skip the multithreaded test in buffered mode. Note that we do not want to skip the multithreaded test in unbuffered collections, so you'll probably need to do something like override the test defined in SyncedCollectionTest with another test defined in BufferedJSONCollectionTest that skips the xdist+PyPy case and otherwise calls the super() test.

@vyasr
Copy link
Contributor

vyasr commented Mar 5, 2022

The same failure of the multithreaded test with PyPy is showing up on #706. It's possible that tests are suddenly detecting a failure in the multithreading logic that they haven't for the past year, although I don't know why that would be. A possible culprit is #658, which bumped our testing environment for pypy from 3.7 to 3.8. Perhaps something is different in PyPy 3.8 vs 3.7.

Now that we've seen this failure on two PRs, my recommendation would be to merge this PR as is and to investigate further in a separate PR. The easiest solution would just be to disable threading support on PyPy if we can't find an underlying cause. I invested a lot of time trying to get to the bottom of this issue in the past and I am unlikely to do so again in the near future.

@vyasr
Copy link
Contributor

vyasr commented Mar 5, 2022

It looks like the first issue that I diagnosed was the underlying problem. I pushed a fix in #710 that resolved the problem for me locally, let's see if tests pass on CI.

EDIT
CI passed. I think we can merge this (with failing tests) after #710 and be fine.

@vyasr vyasr requested a review from csadorf March 6, 2022 18:20
@bdice bdice merged commit a5937cb into master Mar 7, 2022
@bdice bdice deleted the pytest-xdist branch March 7, 2022 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants