Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in logging #88105

Open
DaRoee mannequin opened this issue Apr 25, 2021 · 10 comments
Open

Deadlock in logging #88105

DaRoee mannequin opened this issue Apr 25, 2021 · 10 comments
Assignees
Labels
3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@DaRoee
Copy link
Mannequin

DaRoee mannequin commented Apr 25, 2021

BPO 43939
Nosy @rhettinger, @vsajip, @ncoghlan, @pitrou, @DaRoee
Files
  • reproduction_test.py: reproduction test, should run with pytest for proper reproduction
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/vsajip'
    closed_at = None
    created_at = <Date 2021-04-25.16:58:46.783>
    labels = ['type-bug', 'library', '3.10']
    title = 'Deadlock in logging'
    updated_at = <Date 2021-04-28.00:32:18.301>
    user = 'https://github.com/DaRoee'

    bugs.python.org fields:

    activity = <Date 2021-04-28.00:32:18.301>
    actor = 'rhettinger'
    assignee = 'vinay.sajip'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2021-04-25.16:58:46.783>
    creator = 'DaRoee'
    dependencies = []
    files = ['49983']
    hgrepos = []
    issue_num = 43939
    keywords = []
    message_count = 6.0
    messages = ['391862', '391874', '391886', '391888', '392129', '392158']
    nosy_count = 5.0
    nosy_names = ['rhettinger', 'vinay.sajip', 'ncoghlan', 'pitrou', 'DaRoee']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue43939'
    versions = ['Python 3.10']

    @DaRoee
    Copy link
    Mannequin Author

    DaRoee mannequin commented Apr 25, 2021

    I've recently encountered a weird bug. To make a long story short, I’ve been using an external package to make sure my threads stop working in time, and appearently it’s been using ctypes.pythonapi.PyThreadState_SetAsyncExc.
    There is a slight chance, that this code that is literally injecting an exception in the thread, will throw an exception in the handle function of the logger, right after the acquire. This will make the whole process deadlocked (only the thread with the exception will be able to continue logging) once the other threads try to log.

    While I totally understand that this is not a casual use case, I think that the logger should be able to handle situations such as this...
    The good news is that I’ve created a test (attached) that once you run it with pytest it’ll reproduce constantly (even though it’s a statistical bug), and the solution for this is relatively easy. Once we change the lock to use context manager the whole action is much more atomic, making the test pass constantly.

    I’d be happy to help solve it, and replace locks to context manager usage throughout the file (or whatever the maintainers see fit for this).

    @DaRoee DaRoee mannequin added 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Apr 25, 2021
    @rhettinger
    Copy link
    Contributor

    A few thoughts:

    • We normally say "all bets are off" when it comes to manipulating Python internals with ctypes. This issue is just one of many ways to wreak havoc.

    • The programming pattern of killing running threads at arbitrary points in their execution is inherently fragile. I don't think other modules are obliged to try to support it.

    • The pattern of "acquire(); try: something(); finally release();" is common. Logging is only of many modules in the ecosystem that use this pattern.

    • ISTM we can't assume that the lock is a context manager because createLock() is a public method and users can do anything with it.

    • That said, we could add our own __enter__ and __exit__ methods to the Handler class.

    • Lastly, I have a vague memory that at one point there was a rejected proposal to switch logging to use the with-statement throughout. We should research that to see if the reasoning is still applicable or not.

    @rhettinger rhettinger removed 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes labels Apr 26, 2021
    @pitrou
    Copy link
    Member

    pitrou commented Apr 26, 2021

    Feel free to submit a PR to use context managers, though I'm not sure it solves the problem in a guaranteed way.

    Nick, what is the official word wrt. interruptibility of context managers?

    @pitrou
    Copy link
    Member

    pitrou commented Apr 26, 2021

    But also, please ask the "external package" that uses PyThreadState_SetAsyncExc to clear up its documentation and mentions the pitfalls.

    @DaRoee
    Copy link
    Mannequin Author

    DaRoee mannequin commented Apr 27, 2021

    Ha! Already gave them my analysis, and stopped using that package since.
    While I certainly think this is not the trivial, basic usage, and that the pattern used for the lock is common (and part of some PEP if I remember correctly), I'm kind of puzzled on that fact that context manager solves it.
    The test will pass constantly once the change is made (even though it looks like RLock __enter__ is just calling the acquire function), and I overall think this is a good change to do in the Handler class, even it's just for self.lock (can we find and understand why the last proposal was rejected?).

    I'll submit a PR in a few days, but was wondering on the usage of the lock, and that it doesn't always exists on self. Anyway, I assume these are questions that we can continue discussing in the PR if you also think this is a good direction..

    @rhettinger
    Copy link
    Contributor

    I'm kind of puzzled on that fact that context manager solves it.

    It "solves" the problem because of the happenstance rather than because of language guarantees.

    The current implementation of the with-statement in CPython is to generate a SETUP_WITH opcode that calls both __enter__() and PyFrame_BlockSetup() within a single opcode while the GIL is held. And because RLlock.__enter__() is implemented in C there are no pure python steps before the block frame setup.

    If we added a pure Python __enter__ and __exit__ to the Handler class (necessary because a user can override createLock), then pure python steps would occur before the frame block setup and the problem would reappear.

    If RLock were implemented in pure python, the problem would also reappear.

    If the opcodes were changed so that the call to __enter__() were in a different opcode than setting up the block frame, the problem would reappear.

    I'll submit a PR in a few days

    Let's wait to hear from Vinay and Nick before deciding to work on a PR. Since you are no longer using the package, you no longer have the problem to solve. The programming practice that gave rise to the problem is inherently fragile. The proposed mitigation only helps one module and doesn't solve the problem in general. As described above, the proposed mitigation is fragile and implementation specific. Lastly, it overrides an old decision to not use context managers in the logging module for reasons I no longer remember.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @youtian001
    Copy link

    youtian001 commented Aug 31, 2022

    Hey folks, any update on this issue? Or is there any recommended workaround to interrupt a running thread safely? Thanks.

    @vsajip
    Copy link
    Member

    vsajip commented Aug 31, 2022

    Not from a logging perspective, for the reasons noted above.

    @Peng-wei-Yu
    Copy link

    Hi All,

    is there any workaround or fix version, I met this also and do not know how to deal with this, for now I just remove log part but I'd like keep that so I can know what happend with my program.

    @vsajip
    Copy link
    Member

    vsajip commented Nov 18, 2023

    Note that we now use context managers in the locking code, see #109462 so it might be worth trying the test code to see what the current status is with main branch code.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Status: No status
    Development

    No branches or pull requests

    5 participants