Doc tests action takes too long #118891

Privat33r-dev · 2024-05-10T10:56:16Z

Issue

Usually doctest's action's "Run documentation doctest"¹ job takes around 12 minutes² to execute. Most of this time is taken by the internal framework doctest.py³. The cause of the issue is GC regression in Python 3.13.

Potential Solution

Fix #124567

Additional Improvements

Improve doctest performance

Since the tests are not interconnected (between different doc files), they can be easily parallelized. I suggest using ProcessPoolExecutor that will be suitable to "bypass" GIL (assuming that tasks are CPU-bound). Github runners have 4 cores available⁴, so, potentially, it can speed up execution by up to 4 times.

@AlexWaygood @erlend-aasland

The text was updated successfully, but these errors were encountered:

erlend-aasland · 2024-05-18T22:31:23Z

Sounds good; do you have a PR ready? :)

Privat33r-dev · 2024-05-18T22:57:22Z

I was waiting for a feedback because I might've missed something :) My schedule is a bit tight lately, but I will try to find a time to implement it (hopefully in a week or two).

Privat33r-dev · 2024-05-31T12:18:25Z

Sorry, it's going to be delayed. Currently I am loaded with work, but hopefully soon I can start work on the PR.

hugovk · 2024-05-31T12:30:38Z

No rush, thanks for the update :)

willingc · 2024-07-26T19:36:12Z

@Privat33r-dev Do you still wish to work on a PR for this? If you are busy, would you mind someone picking up the work to do the PR?

Privat33r-dev · 2024-07-26T19:39:53Z

I will try to do it within a week, if I can't manage to find time for this, then it will be a good idea for another person to do it. Sorry for the unexpected delay.

willingc · 2024-07-26T19:45:28Z

No worries on a delay @Privat33r-dev. Seems like it would be a nice addition since the doctests are slow 😉

Privat33r-dev · 2024-07-28T15:26:15Z

After further investigation I understood the following.
The command to run doctests that takes around 13-16 minutes is:
xvfb-run make -C Doc/ PYTHON=../python SPHINXERRORHANDLING="-W --keep-going" doctest
xvfb-run is needed to simulate graphic environment (my guess it is needed to test turtle and some other modules)
make -C Doc/ doctest is launching Doc/Makefile with doctest builder. Other flags are less relevant, but in short they point out to the Python installation and how the sphinx should handle errors.

Inside of the Makefile, the sphinx-build is being executed with the doctest builder.

So, basically, the sphinx wrapper for the doctests is used. Source

Likely the parallelization should be applied there, I will continue and I might update this comment/add a new comment with further findings.

However, what's actually interesting is that speed of the doctest depends on Python version.
Python 3.13 tests are running 3 times slower than 3.12.
Example #1 (Python 3.13) 14m 57s test that ran yesterday
Example #2 (Python 3.12) 5m 18s test that also ran yesterday

Search request:
https://github.com/python/cpython/pulls?q=is%3Apr+label%3Adocs+%5B3.12%5D+

Also caching does not work efficiently and it might contribute to the problem, but it's a different topic.

Privat33r-dev · 2024-08-05T12:22:46Z

I would be glad if someone can join in resolving this issue since currently I am, unfortunately, still occupied by other business. At this weekend I will try to continue.

General conclusion from earlier findings:

The multiprocessing change is needed to be done in the Sphinx, rather than in the Python's core library. However, due to the speed degradation comparing to earlier Python version, I think that it would be highly beneficial to investigate it first, instead of just masking the issue by applying multiprocessing.
There is a major speed degeneration in Python 3.13 doctest on GH Runner comparing to Python 3.12. I have a few theories (all require further testing/findings to be confirmed/rejected):
1. Due to some changes, 3.13 is slower on some operations. I just hope that it's not PEP 703-related. How to check: doctest.py performance tests with profiling on both versions.
2. There were some changes in the doctest library that cause speed loss. How to check: diff doctest.py between both versions.

I can't think of another reason quickly, but it might be as well caused by some Python API changes, building flags or other thing. However, if we can fix and return to near-3.12 speed, it would be a major win already. Applying multiprocessing to it would just skyrocket the speed.

The Plan

Investigate the performance issue
https://github.com/sphinx-doc/sphinx/blob/master/sphinx/ext/doctest.py#L363C1-L368C1 -> multiprocessing

hugovk · 2024-09-24T19:27:11Z

@AA-Turner Do you have any ideas?

The doctest on CPython CI is much slower on 3.13 and 3.14 (main) than with 3.12. It's not specifically doctest, but the main Sphinx build is slow before starting the doctest.

For example:

With the example builds, when the full 3.12 Sphinx build + doctest had finished, 3.13 was only at reading sources... [ 73%] library/stdtypes .. library/trace.

AA-Turner · 2024-09-24T19:37:36Z

Didn't we previously reason that this is as pre-releases don't use an optimised build of Python? What happens if we run CI with the released rc2?

A

hugovk · 2024-09-24T20:23:02Z

For both of these, we use non-optimised Python built from the branch -- because we're testing the branch's Python.

https://github.com/python/cpython/blob/f4997bb3ac961d6aaf07ce650cd074e28ce6ccd0/.github/workflows/reusable-docs.yml#L95_L118

willingc · 2024-09-25T01:52:31Z

Hmm...I'm wondering if we use pytest to run these doctests instead of sphinx build plus doctest. We could then parallelize the doctests if needed. @hugovk @AA-Turner

AA-Turner · 2024-09-25T05:56:07Z

For both of these, we use non-optimised Python built from the branch -- because we're testing the branch's Python.

Right, I was just wondering if the slowdown was soley due to this (which we have no real choice over) or if other factors in Python 3.13+ have caused Sphinx to slow down (which we might be able to resolve).

I'm wondering if we use pytest to run these doctests instead of sphinx build plus doctest.

I don't think pytest supports test{setup,cleanup,code} directives which we make heavy use of.

A

Privat33r-dev · 2024-09-25T06:28:31Z

@AA-Turner

Maybe we can do tests parallelization in the Sphinx itself? If you think that it will work, I can find a time this week to implement it.

https://github.com/sphinx-doc/sphinx/blob/master/sphinx/ext/doctest.py#L363C1-L368C1 -> multiprocessing

AlexWaygood · 2024-09-25T20:50:58Z

I wrote the following script, with which I can reproduce locally a significant performance regression over the course of Python 3.13 development when using Sphinx to build a single documentation file:

Script

import contextlib
import shutil
import subprocess
import time
import venv
from pathlib import Path

def run(args):
    try:
        subprocess.run(args, check=True, capture_output=True, text=True)
    except subprocess.CalledProcessError as e:
        print(e.stdout)
        print(e.stderr)
        raise


with contextlib.chdir("Doc"):
    try:
        for path in Path(".").iterdir():
            if path.is_dir() and not str(path).startswith("."):
                for doc_path in path.rglob("*.rst"):
                    if doc_path != Path("library/typing.rst"):
                        doc_path.write_text("foo")

        venv.create(".venv", with_pip=True)

        run([
            ".venv/bin/python",
            "-m",
            "pip",
            "install",
            "-r",
            "requirements.txt",
            "--no-binary=':all:'",
        ])

        start = time.perf_counter()

        run([
            ".venv/bin/python",
            "-m",
            "sphinx",
            "-b",
            "html",
            ".",
            "build/html",
            "library/typing.rst",
        ])

        print(time.perf_counter() - start)
        shutil.rmtree(".venv")
        shutil.rmtree("build")
    finally:
        subprocess.run(["git", "restore", "."], check=True, capture_output=True)

If I save the script as bisect_sphinx.py in the root of my CPython clone, and then run ./configure --with-pydebug && make -j10 && ./python.exe bisect_sphinx.py, the script reports that building library/typing.rst takes roughly eight seconds on main, but only around 3s with the v3.13.0a1 tag checked out.

I'll now try to use this script to see if I can bisect the performance regression to a specific commit.

AA-Turner · 2024-09-25T21:18:08Z

Thanks Alex. You prompted a memory of sphinx-doc/sphinx#12174, which might be relevant.

A

AlexWaygood · 2024-09-25T21:56:56Z

Thanks Alex. You prompted a memory of sphinx-doc/sphinx#12174, which might be relevant.

Indeed! The results of the bisection are in:

15309329b65a285cb7b3071f0f08ac964b61411b is the first bad commit
commit 15309329b65a285cb7b3071f0f08ac964b61411b
Author: Mark Shannon <[email protected]>
Date:   Wed Mar 20 08:54:42 2024 +0000

    GH-108362: Incremental Cycle GC (GH-116206)

 Doc/whatsnew/3.13.rst                              |  30 +
 Include/internal/pycore_gc.h                       |  41 +-
 Include/internal/pycore_object.h                   |  18 +-
 Include/internal/pycore_runtime_init.h             |   8 +-
 Lib/test/test_gc.py                                |  72 +-
 .../2024-01-07-04-22-51.gh-issue-108362.oB9Gcf.rst |  12 +
 Modules/gcmodule.c                                 |  25 +-
 Objects/object.c                                   |  21 +
 Objects/structseq.c                                |   5 +-
 Python/gc.c                                        | 806 +++++++++++++--------
 Python/gc_free_threading.c                         |  23 +-
 Python/import.c                                    |   2 +-
 Python/optimizer.c                                 |   2 +-
 Tools/gdb/libpython.py                             |   7 +-
 14 files changed, 684 insertions(+), 388 deletions(-)
 create mode 100644 Misc/NEWS.d/next/Core and Builtins/2024-01-07-04-22-51.gh-issue-108362.oB9Gcf.rst

On the commit prior to 1530932, the script reported that building library/typing.rst took 4.47s
On 1530932, the script reported that it took 17.61s
On 909c6f7, it reports that it takes 8.38s

I did another bisect to find out what caused the performance improvement between 1530932 and 909c6f7, and the result was e28477f (git bisect reports it as the "first bad commit", but in this case it's actually the first good commit):

e28477f214276db941e715eebc8cdfb96c1207d9 is the first bad commit
commit e28477f214276db941e715eebc8cdfb96c1207d9
Author: Mark Shannon <[email protected]>
Date:   Fri Mar 22 18:43:25 2024 +0000

    GH-117108: Change the size of the GC increment to about 1% of the total heap size. (GH-117120)

 Include/internal/pycore_gc.h                       |  3 +-
 Lib/test/test_gc.py                                | 35 +++++++++++++++-------
 .../2024-03-21-12-10-11.gh-issue-117108._6jIrB.rst |  3 ++
 Modules/gcmodule.c                                 |  2 +-
 Python/gc.c                                        | 30 +++++++++----------
 Python/gc_free_threading.c                         |  2 +-
 6 files changed, 47 insertions(+), 28 deletions(-)
 create mode 100644 Misc/NEWS.d/next/Core and Builtins/2024-03-21-12-10-11.gh-issue-117108._6jIrB.rst

On e2e0b4b, the script reports 16.46s
On e28477f, the script reports 6.78s -- still much worse than prior to 1530932, but a substantial improvement over e2e0b4b

So, we now know which CPython commits are the root cause of our Doctest CI job taking so much longer on the 3.13 branch than on the 3.12 branch! I'm now going to see if the same performance regression can be reproduced with a non-debug build. (So far, I've only been testing with a debug build, as this is what we use for our Doctest CI job, so this is what was relevant for understanding why our Doctest CI job might be taking so much longer on the 3.13 branch.

nascheme · 2024-09-25T22:15:42Z

Here is what linux-perf shows. 47% of time spent in cyclic GC. I don't see any other obvious hot spots to optimize.

I did a second run with the GC completely disabled and without --with-pydebug. Again no obvious hot spots.

AlexWaygood · 2024-09-25T23:18:02Z

The good news is that the bisection script is hugely faster on a PGO-optimized, non-debug build. The bad news is that it's still a fair amount slower than on 3.13:

With v3.13.0a1 checked out, the script consistently takes between 1.27s and 1.29s on a PGO-optimized, non-debug build
On ede1504, it consistently takes between 1.76 and 1.82s

That's around a 48% performance regression on my bisection script with a PGO-optimized build, which is still quite bad!

nascheme · 2024-09-25T23:32:18Z

I tried running the tests with Python 3.12 and with the gc.DEBUG_STATS flag turned on. The output suggests that this workload is creating a lot of cyclic garbage and the gen 0 (youngest) collection of the cyclic GC is dealing with it. The "unreachable" number is counting cyclic garbage objects found the GC.

I don't know enough about the "incremental" collector to comment on how it might be better optimized for this workload. I do suspect that there are many more similar workloads in the world and the GC that comes with the 3.13 release will hopefully not be much slower for them.

gc: collecting generation 0...
gc: objects in each generation: 1175 3804 1663292
gc: objects in permanent generation: 375
gc: done, 520 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 976 4408 1663292
gc: objects in permanent generation: 375
gc: done, 0 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 982 5336 1663289
gc: objects in permanent generation: 375
gc: done, 404 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 959 5866 1663289
gc: objects in permanent generation: 375
gc: done, 404 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 948 6397 1663288
gc: objects in permanent generation: 375
gc: done, 116 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 0...
gc: objects in each generation: 813 7178 1663288
gc: objects in permanent generation: 375
gc: done, 404 unreachable, 0 uncollectable, 0.0001s elapsed
gc: collecting generation 1...
gc: objects in each generation: 951 7537 1663273
gc: objects in permanent generation: 375
gc: done, 5219 unreachable, 0 uncollectable, 0.0010s elapsed

AlexWaygood · 2024-09-25T23:37:13Z

For colour, I created a variant of my script that measures the execution time of parsing and building all of CPython's docs with a PGO-optimized build. (The script I used for bisecting the perf regression only parsed and built library/typing.rst.) Here are the results on v3.13.0a1 and ede1504:

v3.13.0a1: 45.5s
main: 62.7s

That's a 38% regression; pretty similar to what we saw with the smaller sample (and again, very bad!).

Variant of my script that parses and builds all of CPython's docs

import contextlib
import shutil
import subprocess
import time
import venv
from pathlib import Path

def run(args):
    subprocess.run(args, check=True, text=True)


with contextlib.chdir("Doc"):
    venv.create(".venv", with_pip=True)

    run([
        ".venv/bin/python",
        "-m",
        "pip",
        "install",
        "-r",
        "requirements.txt",
        "--no-binary=':all:'",
    ])

    start = time.perf_counter()

    run([
        ".venv/bin/python",
        "-m",
        "sphinx",
        "-b",
        "html",
        ".",
        "build/html",
    ])

    print(time.perf_counter() - start)
    shutil.rmtree(".venv")
    shutil.rmtree("build")

AlexWaygood · 2024-09-26T00:03:33Z

I created a new issue summarising our findings here. It seems clear that the new incremental GC causes significant performance regressions for workloads like Sphinx that create lots of reference cycles; that seems like something that's worth investigating more:

Incremental GC causes a significant slowdown for Sphinx #124567

Privat33r-dev · 2024-09-26T05:49:05Z

@AlexWaygood amazing findings, thanks for your investigation :) I felt that the issue is with likely with Python itself (because GH Actions are the same & many other things are the same as well), however, I couldn't make a proper investigation due to time constraints.

Thanks for everyone involved, I am glad that you found the root cause.

Privat33r-dev · 2024-10-01T19:29:44Z

The issue was mostly resolved by #124770.

However, there still is a room for improvement, but mostly on the Sphinx's side. I don't see any obvious improvements that can be done in this repository, so the issue is resolved.

I am glad that performance regression was prevented in 3.13 as a result of a teamwork. Thanks for all the participants.

AlexWaygood · 2024-10-01T19:53:09Z

@Privat33r-dev, thanks so much for opening the issue and for spotting how much slower the job was on 3.13 than 3.12. You helped prevent a new version of Python being released with a serious performance regression that could have affected lots of workloads (not just Sphinx!) — it's great that we identified it before the release went out! I'm only sorry it took us a while to investigate.

Privat33r-dev added the docs Documentation in the Doc dir label May 10, 2024

AlexWaygood mentioned this issue Sep 26, 2024

Incremental GC causes a significant slowdown for Sphinx #124567

Closed

Privat33r-dev mentioned this issue Sep 28, 2024

GH-124567: Reduce overhead of cycle GC. #124717

Closed

Privat33r-dev closed this as completed Oct 1, 2024

nascheme mentioned this issue Nov 13, 2024

GH-126491: GC: Mark objects reachable from roots before doing cycle collection #126502

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc tests action takes too long #118891

Doc tests action takes too long #118891

Privat33r-dev commented May 10, 2024 •

edited

Loading

erlend-aasland commented May 18, 2024

Privat33r-dev commented May 18, 2024

Privat33r-dev commented May 31, 2024

hugovk commented May 31, 2024

willingc commented Jul 26, 2024

Privat33r-dev commented Jul 26, 2024 via email •

edited

Loading

willingc commented Jul 26, 2024

Privat33r-dev commented Jul 28, 2024 •

edited

Loading

Privat33r-dev commented Aug 5, 2024 •

edited

Loading

hugovk commented Sep 24, 2024

AA-Turner commented Sep 24, 2024

hugovk commented Sep 24, 2024

willingc commented Sep 25, 2024

AA-Turner commented Sep 25, 2024

Privat33r-dev commented Sep 25, 2024

AlexWaygood commented Sep 25, 2024

AA-Turner commented Sep 25, 2024

AlexWaygood commented Sep 25, 2024

nascheme commented Sep 25, 2024

AlexWaygood commented Sep 25, 2024

nascheme commented Sep 25, 2024

AlexWaygood commented Sep 25, 2024 •

edited

Loading

AlexWaygood commented Sep 26, 2024

Privat33r-dev commented Sep 26, 2024

Privat33r-dev commented Oct 1, 2024 •

edited

Loading

AlexWaygood commented Oct 1, 2024 •

edited

Loading

Doc tests action takes too long #118891

Doc tests action takes too long #118891

Comments

Privat33r-dev commented May 10, 2024 • edited Loading

Issue

Potential Solution

Additional Improvements

Improve doctest performance

Footnotes

erlend-aasland commented May 18, 2024

Privat33r-dev commented May 18, 2024

Privat33r-dev commented May 31, 2024

hugovk commented May 31, 2024

willingc commented Jul 26, 2024

Privat33r-dev commented Jul 26, 2024 via email • edited Loading

willingc commented Jul 26, 2024

Privat33r-dev commented Jul 28, 2024 • edited Loading

Privat33r-dev commented Aug 5, 2024 • edited Loading

The Plan

hugovk commented Sep 24, 2024

AA-Turner commented Sep 24, 2024

hugovk commented Sep 24, 2024

willingc commented Sep 25, 2024

AA-Turner commented Sep 25, 2024

Privat33r-dev commented Sep 25, 2024

AlexWaygood commented Sep 25, 2024

AA-Turner commented Sep 25, 2024

AlexWaygood commented Sep 25, 2024

nascheme commented Sep 25, 2024

AlexWaygood commented Sep 25, 2024

nascheme commented Sep 25, 2024

AlexWaygood commented Sep 25, 2024 • edited Loading

AlexWaygood commented Sep 26, 2024

Privat33r-dev commented Sep 26, 2024

Privat33r-dev commented Oct 1, 2024 • edited Loading

AlexWaygood commented Oct 1, 2024 • edited Loading

Privat33r-dev commented May 10, 2024 •

edited

Loading

Privat33r-dev commented Jul 26, 2024 via email •

edited

Loading

Privat33r-dev commented Jul 28, 2024 •

edited

Loading

Privat33r-dev commented Aug 5, 2024 •

edited

Loading

AlexWaygood commented Sep 25, 2024 •

edited

Loading

Privat33r-dev commented Oct 1, 2024 •

edited

Loading

AlexWaygood commented Oct 1, 2024 •

edited

Loading