Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poetry install crashes due to a race condition and unreported KeyError #9334

Closed
vfazio opened this issue Apr 24, 2024 · 11 comments · Fixed by #9335
Closed

poetry install crashes due to a race condition and unreported KeyError #9334

vfazio opened this issue Apr 24, 2024 · 11 comments · Fixed by #9335
Labels
kind/bug Something isn't working as expected status/triage This issue needs to be triaged

Comments

@vfazio
Copy link
Contributor

vfazio commented Apr 24, 2024

Description

There is a subtle and difficult to reproduce race condition in poetry when using decorated output (default) and parallel installs (default) which leads to poetry randomly failing to install dependencies without displaying the cause of the error.

The gist is this:

Multiple install tasks are queued up:

src/poetry/installation/executor.py
                tasks.append(self._executor.submit(self._execute_operation, operation))

            try:
                wait(tasks)

When self._execute_operation runs we have branching logic:

            if self.supports_fancy_output():
                if id(operation) not in self._sections and self._should_write_operation(
                    operation
                ):
                    with self._lock:
                        self._sections[id(operation)] = self._io.section()
                        self._sections[id(operation)].write_line(
                            f"  <fg=blue;options=bold>-</> {op_message}:"
                            " <fg=blue>Pending...</>"
                        )
            else:
                if self._should_write_operation(operation):
                    if not operation.skipped:
                        self._io.write_line(
                            f"  <fg=blue;options=bold>-</> {op_message}"
                        )
</snip>
            try:
                result = self._do_execute_operation(operation)                        

If fancy output is not supported, a line is written to the output, otherwise a new section is added to our dictionary of sections and then we try to perform the operation.

The operation runs via _do_execute_operation -> execute{op}

  File "/opt/poetry/venv/lib/python3.12/site-packages/poetry/installation/executor.py", line 303, in _execute_operation
    result = self._do_execute_operation(operation)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/poetry/venv/lib/python3.12/site-packages/poetry/installation/executor.py", line 416, in _do_execute_operation
    result: int = getattr(self, f"_execute_{method}")(operation)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/poetry/venv/lib/python3.12/site-packages/poetry/installation/executor.py", line 541, in _execute_install
    status_code = self._install(operation)
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/poetry/venv/lib/python3.12/site-packages/poetry/installation/executor.py", line 586, in _install
    self._write(operation, message)

The operation prints out a message:

        message = (
            f"  <fg=blue;options=bold>-</> {operation_message}:"
            " <info>Installing...</info>"
        )
        self._write(operation, message)

and _write will try to update the section if decorated output is enabled:

Executor
    def _write(self, operation: Operation, line: str) -> None:
        if not self.supports_fancy_output() or not self._should_write_operation(
            operation
        ):
            return

        if self._io.is_debug():
            with self._lock:
                section = self._sections[id(operation)]
                section.write_line(line)

            return

        with self._lock:
            section = self._sections[id(operation)]
            section.clear()
            section.write(line)

The problem, however, is this:

  • Thread A and B are installing packages
  • Thread A is in the middle of updating a section
  File "/opt/poetry/venv/lib/python3.12/site-packages/poetry/installation/executor.py", line 293, in _execute_operation
    self._sections[id(operation)].write_line(
  File "/opt/poetry/venv/lib/python3.12/site-packages/cleo/io/outputs/output.py", line 89, in write_line
    self.write(messages, new_line=True, verbosity=verbosity, type=type)
  File "/opt/poetry/venv/lib/python3.12/site-packages/cleo/io/outputs/output.py", line 110, in write
    self._write(message, new_line=new_line)
  File "/opt/poetry/venv/lib/python3.12/site-packages/cleo/io/outputs/section_output.py", line 83, in _write
    self.add_content(message)
  File "/opt/poetry/venv/lib/python3.12/site-packages/cleo/io/outputs/section_output.py", line 69, in add_content
    len(self.remove_format(line_content).replace("\t", " " * 8))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/poetry/venv/lib/python3.12/site-packages/cleo/io/outputs/output.py", line 116, in remove_format
    return self.formatter.remove_format(text)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/poetry/venv/lib/python3.12/site-packages/cleo/formatters/formatter.py", line 137, in remove_format
    self._decorated = False
    ^^^^^^^^^^^^^^^

However, SectionOutput._write -> add_content -> remove_format causes the formatter, which is shared by the IO object and the sections, to flip _decorated to False

  • Thread B runs _execute_operation and sees supports_fancy_output is False, so prints out to console and does not add a section to the Executor._sections dictionary
  • Thread A restores _decorated to True
  • Thread B runs _do_execute_operation which calls into Executor._write. This sees supports_fancy_output is True so assumes there should be a section for this operation. The section does not exist so a KeyError is thrown.
  • Executor._execute_operation tries to handle the KeyError and sees that supports_fancy_output is True, so calls back into Executor._write which throws yet another KeyError that's raised from within the Exception handler so no output is displayed.

This is what the splat may have looked like if we checked for isinstance(e, KeyError) and didn't try to update the section we thought existed (note i had debug code in place at the time of this)

image

There are probably a number of ways to fix this issue:

  1. cache self._io.output.is_decorated() at the time the Executor is init'd.
  2. update cleo to not flip the _decorated private variable when formatting messages since there's a public interface for that value
  3. update cleo to wrap that private variable in some sort of locking mechanism that remove_format and is_decorated respect
  4. update Executor._write to check if the operation is in the section dictionary in a more graceful way
  5. update Executor.supports_fancy_output to leverage self._lock before querying the backing IO object's formatter. Since sections get updated under this lock, the formatter's _decorated value should be restored by the time the lock is released, however this forces even more things to serialize over this lock which is not necessary

I'll make a PR for 1, since it's a minimal amount of code changes, doesn't complicate any of the section logic any more than is necessary, doesn't require buttressing any of the exception handling logic, and doesn't add yet another synchronization point over Executor._lock. It should allow the value to be stable over time even if cleo flips the formatter decoration off temporarily. I ran this fix in a loop over the course 24 hours and did not see any issues whereas without the fix reproduction would occur within 5-10 minutes.

Note that this can be very difficult to reproduce. I have had luck reproducing on slow hardware under an emulator for a foreign architecture making it even slower so the time window where the formatter decorator is false is large enough for another thread to hit the issue (this is how I initially discovered it in one of our CI pipelines) but short enough for the other thread to see the value flip back to true when trying to print a progress message.

Adding a sleep in cleo/formatters/formatter.py::Formatter.remove_format or a series of events/signals may be easier

Workarounds

Run poetry with NO_COLOR=1 defined in the environment
Run poetry with the --no-ansi argument on the command line
Run poetry with the -q argument on the command line

Poetry Installation Method

install.python-poetry.org

Operating System

Debian 12

Poetry Version

Poetry (version 1.8.2)

Poetry Configuration

cache-dir = "/root/.cache/pypoetry"
experimental.system-git-client = false
installer.max-workers = null
installer.modern-installation = true
installer.no-binary = null
installer.parallel = true
keyring.enabled = true
repositories.upstream-pypi.url = "https://redacted.com/repository/upstream-pypi/simple"
solver.lazy-wheel = true
virtualenvs.create = true
virtualenvs.in-project = true
virtualenvs.options.always-copy = false
virtualenvs.options.no-pip = false
virtualenvs.options.no-setuptools = false
virtualenvs.options.system-site-packages = false
virtualenvs.path = "{cache-dir}/virtualenvs"  # /root/.cache/pypoetry/virtualenvs
virtualenvs.prefer-active-python = false
virtualenvs.prompt = "{project_name}-py{python_version}"
warnings.export = true

Python Sysconfig

No response

Example pyproject.toml

[tool.poetry.dependencies]
python = "^3.11"
fastapi = "0.101.1"
uvicorn = "0.23.2"
requests = "2.31.0"
pexpect = "4.8.0"
gitpython = "3.1.32"
anyio = "3.7.1"
httpx = "0.24.1"
aiofiles = "23.2.1"
asyncstdlib = "3.10.8"
pydantic = "1.10.12"

[tool.poetry.group.dev.dependencies]
pytest = "7.4.0"
mypy = "1.5.1"
pytest-mypy = "0.10.3"
pytest-mock = "3.11.1"
async-asgi-testclient = "1.4.11"
pytest-cov = "4.1.0"
pre-commit = "3.3.3"

Poetry Runtime Logs

Loading configuration file /root/.config/pypoetry/config.toml
Loading configuration file /opt/tss/poetry.toml
Adding repository upstream-pypi (https://redacted.com/repository/upstream-pypi/simple) and setting it as primary
Deactivating the PyPI repository
Creating virtualenv tss in /opt/tss/.venv
[virtualenv] find interpreter for spec PythonSpec(path=/opt/poetry/venv/bin/python)
[filelock:filelock] Attempting to acquire lock 365175376064 on /root/.local/share/virtualenv/py_info/1/aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49.lock
[filelock:filelock] Lock 365175376064 acquired on /root/.local/share/virtualenv/py_info/1/aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49.lock
[virtualenv] got python info of %s from (PosixPath('/usr/local/bin/python3'), PosixPath('/root/.local/share/virtualenv/py_info/1/aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49.json'))
[filelock:filelock] Attempting to release lock 365175376064 on /root/.local/share/virtualenv/py_info/1/aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49.lock
[filelock:filelock] Lock 365175376064 released on /root/.local/share/virtualenv/py_info/1/aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49.lock
[virtualenv] proposed PythonInfo(spec=CPython3.12.2.final.0-64, system=/usr/local/bin/python3, exe=/opt/poetry/venv/bin/python, platform=linux, version='3.12.2 (main, Apr 19 2024, 21:41:48) [GCC 12.2.0]', encoding_fs_io=utf-8-utf-8)
[virtualenv] accepted PythonInfo(spec=CPython3.12.2.final.0-64, system=/usr/local/bin/python3, exe=/opt/poetry/venv/bin/python, platform=linux, version='3.12.2 (main, Apr 19 2024, 21:41:48) [GCC 12.2.0]', encoding_fs_io=utf-8-utf-8)
[virtualenv] filesystem is case-sensitive
[virtualenv] create virtual environment via CPython3Posix(dest=/opt/tss/.venv, clear=False, no_vcs_ignore=False, global=False)
[virtualenv] create folder /opt/tss/.venv/bin
[virtualenv] create folder /opt/tss/.venv/lib/python3.12/site-packages
[virtualenv] write /opt/tss/.venv/pyvenv.cfg
[virtualenv] 	home = /usr/local/bin
[virtualenv] 	implementation = CPython
[virtualenv] 	version_info = 3.12.2.final.0
[virtualenv] 	virtualenv = 20.25.3
[virtualenv] 	include-system-site-packages = false
[virtualenv] 	base-prefix = /usr/local
[virtualenv] 	base-exec-prefix = /usr/local
[virtualenv] 	base-executable = /usr/local/bin/python3.12
[virtualenv] symlink /usr/local/bin/python3 to /opt/tss/.venv/bin/python
[virtualenv] create virtualenv import hook file /opt/tss/.venv/lib/python3.12/site-packages/_virtualenv.pth
[virtualenv] create /opt/tss/.venv/lib/python3.12/site-packages/_virtualenv.py
[virtualenv] ============================== target debug ==============================
[virtualenv] debug via /opt/tss/.venv/bin/python /opt/poetry/venv/lib/python3.12/site-packages/virtualenv/create/debug.py
[virtualenv] {
[virtualenv]   "sys": {
[virtualenv]     "executable": "/opt/tss/.venv/bin/python",
[virtualenv]     "_base_executable": "/usr/local/bin/python3.12",
[virtualenv]     "prefix": "/opt/tss/.venv",
[virtualenv]     "base_prefix": "/usr/local",
[virtualenv]     "real_prefix": null,
[virtualenv]     "exec_prefix": "/opt/tss/.venv",
[virtualenv]     "base_exec_prefix": "/usr/local",
[virtualenv]     "path": [
[virtualenv]       "/usr/local/lib/python312.zip",
[virtualenv]       "/usr/local/lib/python3.12",
[virtualenv]       "/usr/local/lib/python3.12/lib-dynload",
[virtualenv]       "/opt/tss/.venv/lib/python3.12/site-packages"
[virtualenv]     ],
[virtualenv]     "meta_path": [
[virtualenv]       "<class '_virtualenv._Finder'>",
[virtualenv]       "<class '_frozen_importlib.BuiltinImporter'>",
[virtualenv]       "<class '_frozen_importlib.FrozenImporter'>",
[virtualenv]       "<class '_frozen_importlib_external.PathFinder'>"
[virtualenv]     ],
[virtualenv]     "fs_encoding": "utf-8",
[virtualenv]     "io_encoding": "utf-8"
[virtualenv]   },
[virtualenv]   "version": "3.12.2 (main, Apr 19 2024, 21:41:48) [GCC 12.2.0]",
[virtualenv]   "makefile_filename": "/usr/local/lib/python3.12/config-3.12-aarch64-linux-gnu/Makefile",
[virtualenv]   "os": "<module 'os' (frozen)>",
[virtualenv]   "site": "<module 'site' (frozen)>",
[virtualenv]   "datetime": "<module 'datetime' from '/usr/local/lib/python3.12/datetime.py'>",
[virtualenv]   "math": "<module 'math' from '/usr/local/lib/python3.12/lib-dynload/math.cpython-312-aarch64-linux-gnu.so'>",
[virtualenv]   "json": "<module 'json' from '/usr/local/lib/python3.12/json/__init__.py'>"
[virtualenv] }
[virtualenv] add seed packages via FromAppData(download=False, pip=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
[virtualenv] install pip from wheel /opt/poetry/venv/lib/python3.12/site-packages/virtualenv/seed/wheels/embed/pip-24.0-py3-none-any.whl via CopyPipInstall
[filelock:filelock] Attempting to acquire lock 365172584912 on /root/.local/share/virtualenv/wheel/3.12/image/1/CopyPipInstall/pip-24.0-py3-none-any.lock
[filelock:filelock] Lock 365172584912 acquired on /root/.local/share/virtualenv/wheel/3.12/image/1/CopyPipInstall/pip-24.0-py3-none-any.lock
[filelock:filelock] Attempting to release lock 365172584912 on /root/.local/share/virtualenv/wheel/3.12/image/1/CopyPipInstall/pip-24.0-py3-none-any.lock
[filelock:filelock] Lock 365172584912 released on /root/.local/share/virtualenv/wheel/3.12/image/1/CopyPipInstall/pip-24.0-py3-none-any.lock
[virtualenv] copy directory /root/.local/share/virtualenv/wheel/3.12/image/1/CopyPipInstall/pip-24.0-py3-none-any/pip to /opt/tss/.venv/lib/python3.12/site-packages/pip
[virtualenv] copy directory /root/.local/share/virtualenv/wheel/3.12/image/1/CopyPipInstall/pip-24.0-py3-none-any/pip-24.0.dist-info to /opt/tss/.venv/lib/python3.12/site-packages/pip-24.0.dist-info
[virtualenv] copy /root/.local/share/virtualenv/wheel/3.12/image/1/CopyPipInstall/pip-24.0-py3-none-any/pip-24.0.virtualenv to /opt/tss/.venv/lib/python3.12/site-packages/pip-24.0.virtualenv
[distlib:distlib.util] changing mode of /opt/tss/.venv/bin/pip to 755
[distlib:distlib.util] changing mode of /opt/tss/.venv/bin/pip3.12 to 755
[distlib:distlib.util] changing mode of /opt/tss/.venv/bin/pip-3.12 to 755
[distlib:distlib.util] changing mode of /opt/tss/.venv/bin/pip3 to 755
[virtualenv] generated console scripts pip pip-3.12 pip3.12 pip3
[virtualenv] add activators for Bash, CShell, Fish, Nushell, PowerShell, Python
[virtualenv] write /opt/tss/.venv/pyvenv.cfg
[virtualenv] 	home = /usr/local/bin
[virtualenv] 	implementation = CPython
[virtualenv] 	version_info = 3.12.2.final.0
[virtualenv] 	virtualenv = 20.25.3
[virtualenv] 	include-system-site-packages = false
[virtualenv] 	base-prefix = /usr/local
[virtualenv] 	base-exec-prefix = /usr/local
[virtualenv] 	base-executable = /usr/local/bin/python3.12
[virtualenv] 	prompt = tss-py3.12
Using virtualenv: /opt/tss/.venv
Installing dependencies from lock file

Finding the necessary packages for the current system

Package operations: 23 installs, 0 updates, 0 removals

  - Installing idna (3.4): Pending...
Checking if keyring is available
  - Installing sniffio (1.3.0): Pending...
Checking if keyring is available
[keyring:keyring.backend] Loading KWallet
[keyring:keyring.backend] Loading KWallet
[keyring:keyring.backend] Loading SecretService
[keyring:keyring.backend] Loading SecretService
[keyring:keyring.backend] Loading Windows
[keyring:keyring.backend] Loading Windows
[keyring:keyring.backend] Loading chainer
[keyring:keyring.backend] Loading chainer
[keyring:keyring.backend] Loading libsecret
[keyring:keyring.backend] Loading libsecret
[keyring:keyring.backend] Loading macOS
[keyring:keyring.backend] Loading macOS
Backend 'fail Keyring' is not suitable
No valid keyring backend was found
Creating new session for redacted.com
Backend 'fail Keyring' is not suitable
No valid keyring backend was found
[urllib3:urllib3.connectionpool] Starting new HTTPS connection (1): redacted.com:443
[urllib3:urllib3.connectionpool] Starting new HTTPS connection (2): redacted.com:443
[urllib3:urllib3.connectionpool] https://redacted.com:443 "GET /repository/upstream-pypi/simple/sniffio/ HTTP/1.1" 200 None
  - Installing idna (3.4): Installing...
  - Installing idna (3.4)
  - Installing sniffio (1.3.0): Pending...
  - Installing sniffio (1.3.0): Installing...
  - Installing sniffio (1.3.0)
  - Installing anyio (3.7.1): Pending...
  - Installing h11 (0.14.0)
  - Installing certifi (2023.7.22): Pending...
  - Installing smmap (5.0.0): Pending...
  - Installing typing-extensions (4.7.1): Pending...
[urllib3:urllib3.connectionpool] Starting new HTTPS connection (3): redacted.com:443
[urllib3:urllib3.connectionpool] Starting new HTTPS connection (4): redacted.com:443
  - Installing anyio (3.7.1): Installing...
  - Installing certifi (2023.7.22): Pending...
  - Installing certifi (2023.7.22): Installing...
  - Installing anyio (3.7.1)
[urllib3:urllib3.connectionpool] https://redacted.com:443 "GET /repository/upstream-pypi/simple/smmap/ HTTP/1.1" 200 None
  - Installing certifi (2023.7.22): Pending...
  - Installing certifi (2023.7.22): Installing...
  - Installing certifi (2023.7.22)
  - Installing smmap (5.0.0): Pending...
  - Installing smmap (5.0.0): Installing...
  - Installing smmap (5.0.0)
  - Installing typing-extensions (4.7.1): Pending...
  - Installing typing-extensions (4.7.1): Installing...
  - Installing typing-extensions (4.7.1)
root@c762dd4fd385:/opt/tss# echo $?
1
@vfazio vfazio added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Apr 24, 2024
@dimbleby
Copy link
Contributor

update cleo to not flip the _decorated private variable when formatting messages since there's a public interface for that value

seems like the correct fix

@vfazio
Copy link
Contributor Author

vfazio commented Apr 28, 2024

update cleo to not flip the _decorated private variable when formatting messages since there's a public interface for that value

seems like the correct fix

It does seem like a better long term fix, but cleo looks to be in the middle of a large update, which would require either a patch release and a dependency bump for poetry or waiting until the next release with the fix.

The code change to accommodate this is also not "trivial" as there are about 4 function calls that would need to be made "non-decorated" aware. Cleo doesn't advertise being thread-safe afaik but i didn't read too deeply. The concurrent use by poetry does seem "special"

Poetry makes the assumption that this value should never change in the middle of performing a list of operations. Modifying that behavior to handle dynamic changing seems overkill in this scenario.

Option 1 (caching the value on init) is the least painful, obviates waiting for cleo and a dependency bump, and also is a bit more performant in terms of querying the property.

@dimbleby
Copy link
Contributor

that mostly is a discussion you ought to have with the cleo maintainers, who hopefully will have opinions on whether it is intended to be thread-safe, whether a fix is desirable, whether now is a good time, etc - suggest raising an issue over there and starting that conversation

@vfazio
Copy link
Contributor Author

vfazio commented Apr 28, 2024

I'll write the issue, but i'm curious why patching poetry in the interim is not an option

@dimbleby
Copy link
Contributor

I have no particular view (or say) on whether patching poetry in the interim is an option. I am saying that it seems to me that the correct fix is in cleo and so you should explore that first

@vfazio
Copy link
Contributor Author

vfazio commented Apr 28, 2024

I opened an issue so at least a conversation can be had and documented. I should have done that first, thanks @dimbleby

@radoering
Copy link
Member

My two cents: I do not expect a quick cleo release with a fix and think we can include #9335 in Poetry 1.8.3. (I'm just waiting for #9341.)

Thanks for the detailed analysis @vfazio 👍

@Secrus
Copy link
Member

Secrus commented Apr 29, 2024

I will take a look at it on the Cleo side, but in the mean time it could be fixed on Poetry side.

@vfazio
Copy link
Contributor Author

vfazio commented Apr 29, 2024

My two cents: I do not expect a quick cleo release with a fix and think we can include #9335 in Poetry 1.8.3. (I'm just waiting for #9341.)

Thanks for the detailed analysis @vfazio 👍

@radoering @Secrus out of curiosity, is there a "timeline" for poetry 1.8.3? I'm currently planning to migrate to 1.8.2 but if 1.8.3 is on the horizon, I'll hold off if it will include this fix.

@radoering
Copy link
Member

There is no timeline, but since the backports workflow is working again, I assume within the next few days.

Copy link

github-actions bot commented Jun 1, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Something isn't working as expected status/triage This issue needs to be triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants