-
-
Notifications
You must be signed in to change notification settings - Fork 31.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc tests action takes too long #118891
Comments
Sounds good; do you have a PR ready? :) |
I was waiting for a feedback because I might've missed something :) My schedule is a bit tight lately, but I will try to find a time to implement it (hopefully in a week or two). |
Sorry, it's going to be delayed. Currently I am loaded with work, but hopefully soon I can start work on the PR. |
No rush, thanks for the update :) |
@Privat33r-dev Do you still wish to work on a PR for this? If you are busy, would you mind someone picking up the work to do the PR? |
I will try to do it within a week, if I can't manage to find time for this, then it will be a good idea for another person to do it. Sorry for the unexpected delay.
|
No worries on a delay @Privat33r-dev. Seems like it would be a nice addition since the doctests are slow 😉 |
After further investigation I understood the following. Inside of the Makefile, the sphinx-build is being executed with the doctest builder. So, basically, the sphinx wrapper for the doctests is used. Source Likely the parallelization should be applied there, I will continue and I might update this comment/add a new comment with further findings. However, what's actually interesting is that speed of the doctest depends on Python version. Search request: Also caching does not work efficiently and it might contribute to the problem, but it's a different topic. |
I would be glad if someone can join in resolving this issue since currently I am, unfortunately, still occupied by other business. At this weekend I will try to continue. General conclusion from earlier findings:
I can't think of another reason quickly, but it might be as well caused by some Python API changes, building flags or other thing. However, if we can fix and return to near-3.12 speed, it would be a major win already. Applying multiprocessing to it would just skyrocket the speed. The Plan
|
@AA-Turner Do you have any ideas? The doctest on CPython CI is much slower on 3.13 and 3.14 ( For example:
With the example builds, when the full 3.12 Sphinx build + doctest had finished, 3.13 was only at |
Didn't we previously reason that this is as pre-releases don't use an optimised build of Python? What happens if we run CI with the released rc2? A |
For both of these, we use non-optimised Python built from the branch -- because we're testing the branch's Python. |
Hmm...I'm wondering if we use pytest to run these doctests instead of sphinx build plus doctest. We could then parallelize the doctests if needed. @hugovk @AA-Turner |
Right, I was just wondering if the slowdown was soley due to this (which we have no real choice over) or if other factors in Python 3.13+ have caused Sphinx to slow down (which we might be able to resolve).
I don't think pytest supports A |
Maybe we can do tests parallelization in the Sphinx itself? If you think that it will work, I can find a time this week to implement it. https://github.com/sphinx-doc/sphinx/blob/master/sphinx/ext/doctest.py#L363C1-L368C1 -> multiprocessing |
I wrote the following script, with which I can reproduce locally a significant performance regression over the course of Python 3.13 development when using Sphinx to build a single documentation file: Scriptimport contextlib
import shutil
import subprocess
import time
import venv
from pathlib import Path
def run(args):
try:
subprocess.run(args, check=True, capture_output=True, text=True)
except subprocess.CalledProcessError as e:
print(e.stdout)
print(e.stderr)
raise
with contextlib.chdir("Doc"):
try:
for path in Path(".").iterdir():
if path.is_dir() and not str(path).startswith("."):
for doc_path in path.rglob("*.rst"):
if doc_path != Path("library/typing.rst"):
doc_path.write_text("foo")
venv.create(".venv", with_pip=True)
run([
".venv/bin/python",
"-m",
"pip",
"install",
"-r",
"requirements.txt",
"--no-binary=':all:'",
])
start = time.perf_counter()
run([
".venv/bin/python",
"-m",
"sphinx",
"-b",
"html",
".",
"build/html",
"library/typing.rst",
])
print(time.perf_counter() - start)
shutil.rmtree(".venv")
shutil.rmtree("build")
finally:
subprocess.run(["git", "restore", "."], check=True, capture_output=True) If I save the script as I'll now try to use this script to see if I can bisect the performance regression to a specific commit. |
Thanks Alex. You prompted a memory of sphinx-doc/sphinx#12174, which might be relevant. A |
Indeed! The results of the bisection are in:
I did another bisect to find out what caused the performance improvement between 1530932 and 909c6f7, and the result was e28477f (
So, we now know which CPython commits are the root cause of our Doctest CI job taking so much longer on the |
The good news is that the bisection script is hugely faster on a PGO-optimized, non-debug build. The bad news is that it's still a fair amount slower than on 3.13:
That's around a 48% performance regression on my bisection script with a PGO-optimized build, which is still quite bad! |
I tried running the tests with Python 3.12 and with the gc.DEBUG_STATS flag turned on. The output suggests that this workload is creating a lot of cyclic garbage and the gen 0 (youngest) collection of the cyclic GC is dealing with it. The "unreachable" number is counting cyclic garbage objects found the GC. I don't know enough about the "incremental" collector to comment on how it might be better optimized for this workload. I do suspect that there are many more similar workloads in the world and the GC that comes with the 3.13 release will hopefully not be much slower for them.
|
For colour, I created a variant of my script that measures the execution time of parsing and building all of CPython's docs with a PGO-optimized build. (The script I used for bisecting the perf regression only parsed and built
That's a 38% regression; pretty similar to what we saw with the smaller sample (and again, very bad!). Variant of my script that parses and builds all of CPython's docsimport contextlib
import shutil
import subprocess
import time
import venv
from pathlib import Path
def run(args):
subprocess.run(args, check=True, text=True)
with contextlib.chdir("Doc"):
venv.create(".venv", with_pip=True)
run([
".venv/bin/python",
"-m",
"pip",
"install",
"-r",
"requirements.txt",
"--no-binary=':all:'",
])
start = time.perf_counter()
run([
".venv/bin/python",
"-m",
"sphinx",
"-b",
"html",
".",
"build/html",
])
print(time.perf_counter() - start)
shutil.rmtree(".venv")
shutil.rmtree("build") |
I created a new issue summarising our findings here. It seems clear that the new incremental GC causes significant performance regressions for workloads like Sphinx that create lots of reference cycles; that seems like something that's worth investigating more: |
@AlexWaygood amazing findings, thanks for your investigation :) I felt that the issue is with likely with Python itself (because GH Actions are the same & many other things are the same as well), however, I couldn't make a proper investigation due to time constraints. Thanks for everyone involved, I am glad that you found the root cause. |
The issue was mostly resolved by #124770. However, there still is a room for improvement, but mostly on the Sphinx's side. I don't see any obvious improvements that can be done in this repository, so the issue is resolved. I am glad that performance regression was prevented in 3.13 as a result of a teamwork. Thanks for all the participants. |
@Privat33r-dev, thanks so much for opening the issue and for spotting how much slower the job was on 3.13 than 3.12. You helped prevent a new version of Python being released with a serious performance regression that could have affected lots of workloads (not just Sphinx!) — it's great that we identified it before the release went out! I'm only sorry it took us a while to investigate. |
Issue
Usually doctest's action's "Run documentation doctest"1 job takes around 12 minutes2 to execute. Most of this time is taken by the internal framework doctest.py3. The cause of the issue is GC regression in Python 3.13.
Potential Solution
Fix #124567
Additional Improvements
Improve doctest performance
Since the tests are not interconnected (between different doc files), they can be easily parallelized. I suggest using
ProcessPoolExecutor
that will be suitable to "bypass" GIL (assuming that tasks are CPU-bound). Github runners have 4 cores available4, so, potentially, it can speed up execution by up to 4 times.@AlexWaygood @erlend-aasland
Footnotes
https://github.com/python/cpython/blob/7ac933e2609b2ef9b08ccf9c815b682b0e1ede2a/.github/workflows/reusable-docs.yml#L108 ↩
https://github.com/python/cpython/actions/runs/8922903843/job/24506021604 ↩
https://github.com/python/cpython/blob/main/Lib/doctest.py ↩
https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories ↩
The text was updated successfully, but these errors were encountered: