-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-119511: Fix OOM vulnerability in imaplib #119514
Conversation
The IMAP4 client could consume an arbitrary amount of memory when trying to connent to a malicious server, because it read a "literal" data with a single read(size) call, and BufferedReader.read() allocates the bytes object of the specified size before reading. Now the IMAP4 client reads data by chunks, therefore the amount of used memory is limited by the amount of the data actually been sent by the server.
I've marked this Draft for now as discussion on this on the security response team list is not complete. (we'll summarize that in a public issue once it has settled) |
Lib/imaplib.py
Outdated
delta = min(cursize, size - cursize) | ||
data += self.file.read(delta) | ||
cursize += delta | ||
return data | ||
return self.file.read(size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you've missed erasing this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Lib/imaplib.py
Outdated
delta = min(cursize, size - cursize) | ||
data += self.file.read(delta) | ||
cursize += delta | ||
return data | ||
return self.file.read(size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Since this did not block a release, I'm switching it to deferred blocker. |
@gpshead, is the discussion still going on? |
@SethMichaelLarson Could you confirm that the PSRT is still discussing this? |
Thanks @serhiy-storchaka for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10, 3.11, 3.12, 3.13. |
…GH-119514) The IMAP4 client could consume an arbitrary amount of memory when trying to connect to a malicious server, because it read a "literal" data with a single read(size) call, and BufferedReader.read() allocates the bytes object of the specified size before reading. Now the IMAP4 client reads data by chunks, therefore the amount of used memory is limited by the amount of the data actually been sent by the server. (cherry picked from commit 735f25c) Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: Gregory P. Smith <[email protected]>
GH-129355 is a backport of this pull request to the 3.13 branch. |
…GH-119514) The IMAP4 client could consume an arbitrary amount of memory when trying to connect to a malicious server, because it read a "literal" data with a single read(size) call, and BufferedReader.read() allocates the bytes object of the specified size before reading. Now the IMAP4 client reads data by chunks, therefore the amount of used memory is limited by the amount of the data actually been sent by the server. (cherry picked from commit 735f25c) Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: Gregory P. Smith <[email protected]>
GH-129356 is a backport of this pull request to the 3.12 branch. |
…GH-119514) The IMAP4 client could consume an arbitrary amount of memory when trying to connect to a malicious server, because it read a "literal" data with a single read(size) call, and BufferedReader.read() allocates the bytes object of the specified size before reading. Now the IMAP4 client reads data by chunks, therefore the amount of used memory is limited by the amount of the data actually been sent by the server. (cherry picked from commit 735f25c) Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: Gregory P. Smith <[email protected]>
GH-129357 is a backport of this pull request to the 3.11 branch. |
…GH-119514) The IMAP4 client could consume an arbitrary amount of memory when trying to connect to a malicious server, because it read a "literal" data with a single read(size) call, and BufferedReader.read() allocates the bytes object of the specified size before reading. Now the IMAP4 client reads data by chunks, therefore the amount of used memory is limited by the amount of the data actually been sent by the server. (cherry picked from commit 735f25c) Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: Gregory P. Smith <[email protected]>
Sorry, @serhiy-storchaka and @encukou, I could not cleanly backport this to
|
GH-129358 is a backport of this pull request to the 3.10 branch. |
…9514) (GH-129355) gh-119511: Fix a potential denial of service in imaplib (GH-119514) The IMAP4 client could consume an arbitrary amount of memory when trying to connect to a malicious server, because it read a "literal" data with a single read(size) call, and BufferedReader.read() allocates the bytes object of the specified size before reading. Now the IMAP4 client reads data by chunks, therefore the amount of used memory is limited by the amount of the data actually been sent by the server. (cherry picked from commit 735f25c) Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: Gregory P. Smith <[email protected]>
…9514) (GH-129356) gh-119511: Fix a potential denial of service in imaplib (GH-119514) The IMAP4 client could consume an arbitrary amount of memory when trying to connect to a malicious server, because it read a "literal" data with a single read(size) call, and BufferedReader.read() allocates the bytes object of the specified size before reading. Now the IMAP4 client reads data by chunks, therefore the amount of used memory is limited by the amount of the data actually been sent by the server. (cherry picked from commit 735f25c) Co-authored-by: Serhiy Storchaka <[email protected]> Co-authored-by: Gregory P. Smith <[email protected]>
Our read() implementation designed to support IDLE replaces the one from PR python#119514, fixing the same problem it was addressing. The tests that it added are preserved.
* gh-55454: Add IMAP4 IDLE support to imaplib This extends imaplib with support for the rfc2177 IMAP IDLE command, as requested in #55454. It allows events to be pushed to a client as they occur, rather than having to continually poll for mailbox changes. The interface is a new idle() method, which returns an iterable context manager. Entering the context starts IDLE mode, during which events (untagged responses) can be retrieved using the iteration protocol. Exiting the context sends DONE to the server, ending IDLE mode. An optional time limit for the IDLE session is supported, for use with servers that impose an inactivity timeout. The context manager also offers a burst() method, designed for programs wishing to process events in batch rather than one at a time. Notable differences from other implementations: - It's an extension to imaplib, rather than a replacement. - It doesn't introduce additional threads. - It doesn't impose new requirements on the use of imaplib's existing methods. - It passes the unit tests in CPython's test/test_imaplib.py module (and adds new ones). - It works on Windows, Linux, and other unix-like systems. - It makes IDLE available on all of imaplib's client variants (including IMAP4_stream). - The interface is pythonic and easy to use. Caveats: - Due to a Windows limitation, the special case of IMAP4_stream running on Windows lacks a duration/timeout feature. (This is the stdin/stdout pipe connection variant; timeouts work fine for socket-based connections, even on Windows.) I have documented it where appropriate. - The file-like imaplib instance attributes are changed from buffered to unbuffered mode. This could potentially break any client code that uses those objects directly without expecting partial reads/writes. However, these attributes are undocumented. As such, I think (and PEP 8 confirms) that they are fair game for changes. https://peps.python.org/pep-0008/#public-and-internal-interfaces Usage examples: #55454 (comment) Original discussion: https://discuss.python.org/t/gauging-interest-in-my-imap4-idle-implementation-for-imaplib/59272 Earlier requests and suggestions: #55454 https://mail.python.org/archives/list/[email protected]/thread/C4TVEYL5IBESQQPPS5GBR7WFBXCLQMZ2/ * gh-55454: Clarify imaplib idle() docs - Add example idle response tuples, to make the minor difference from other imaplib response tuples more obvious. - Merge the idle context manager's burst() method docs with the IMAP object's idle() method docs, for easier understanding. - Upgrade the Windows note regarding lack of pipe timeouts to a warning. - Rephrase various things for clarity. * docs: words instead of <= Co-authored-by: Peter Bierma <[email protected]> * docs: improve style in an example Co-authored-by: Peter Bierma <[email protected]> * docs: grammatical edit Co-authored-by: Peter Bierma <[email protected]> * docs consistency Co-authored-by: Peter Bierma <[email protected]> * comment -> docstring Co-authored-by: Peter Bierma <[email protected]> * docs: refer to imaplib as "this module" Co-authored-by: Peter Bierma <[email protected]> * imaplib: simplify & clarify idle debug message Co-authored-by: Peter Bierma <[email protected]> * imaplib: elaborate in idle context manager comment * imaplib: re-raise BaseException instead of bare except Co-authored-by: Peter Bierma <[email protected]> * imaplib: convert private doc string to comment * docs: correct mistake in imaplib example This is a correction to 8077f2e, which changed a variable name in only one place and broke the subsequent reference to it, departed from the naming convention used in the rest of the module, and shadowed the type() builtin along the way. * imaplib: simplify example code in doc string This is for consistency with the documentation change in 8077f2e and subsequent correction in 013bbf1. * imaplib: rename _Idler to Idler, update its docs * imaplib: add comment in Idler._pop() Co-authored-by: Peter Bierma <[email protected]> * imaplib: remove unnecessary blank line Co-authored-by: Peter Bierma <[email protected]> * imaplib: comment on use of unbuffered pipes * docs: imaplib: use the reStructuredText :class: role Co-authored-by: Peter Bierma <[email protected]> * Revert "docs: imaplib: use the reStructuredText :class: role" This reverts commit f385e44, because it triggers CI failures in the docs by referencing a class that is (deliberately) undocumented. * docs: imaplib: use the reST :class: role, escaped This is a different approach to f385e44, which was reverted for creating dangling link references. By prefixing the reStructuredText role target with a ! we disable conversion to a link, thereby passing continuous integration checks even though the referenced class is deliberately absent from the documentation. * docs: refer to IMAP4 IDLE instead of just IDLE This clarifies that we are referring to the email protocol, not the editor with the same name. Co-authored-by: Guido van Rossum <[email protected]> * imaplib: IDLE -> IMAP4 IDLE in exception message Co-authored-by: Peter Bierma <[email protected]> * docs: imaplib idle() phrasing and linking tweaks * docs: imaplib: avoid linking to an invalid target This reverts and rephrases part of a3f21cd which created links to a method on a deliberately undocumented class. The links didn't work consistently, and caused sphinx warnings that broke cpython's continuous integration tests. * imaplib: update test after recent exception change This fixes a test that was broken by changing an exception in b01de95 * imaplib: rename idle() dur argument to duration * imaplib: bytes.index() -> bytes.find() This makes it more obvious which statement triggers the branch. * imaplib: remove no-longer-necessary statement Co-authored-by: Martin Panter <[email protected]> * docs: imaplib: concise & valid method links The burst() method is a little tricky to link in restructuredText, due to quirks of its parent class. This syntax allows sphinx to generate working links without generating warnings (which break continuous integration) and without burdening the reader with unimportant namespace qualifications. It makes the reST source ugly, but few people read the reST source, so it's a tolerable tradeoff. * imaplib: note data types present in IDLE responses * docs: imaplib: add comma to reST changes header Co-authored-by: Bénédikt Tran <[email protected]> * imaplib: sync doc strings with reST docs * docs: imaplib: minor Idler clarifications * imaplib: idle: emit (type, [data, ...]) tuples This allows our iterator to emit untagged responses that contain literal strings in the same way that imaplib's existing methods do, while still emitting exactly one whole response per iteration. * imaplib: while/yield instead of yield from iter() * imaplib: idle: use deadline idiom when iterating This simplifies the code, and avoids idle duration drift from time spent processing each iteration. * docs: imaplib: state duration/interval arg types * docs: imaplib: minor rephrasing of a sentence * docs: imaplib: reposition a paragraph This might improve readability, especially when encountering Idler.burst() for the first time. * docs: imaplib: wrap long lines in idle() section * docs: imaplib: note: Idler objects require 'with' * docs: imaplib: say that 29 minutes is 1740 seconds * docs: imaplib: mark a paragraph as a 'tip' * docs: imaplib: rephrase reference to MS Windows * imaplib: end doc string titles with a period * imaplib: idle: socket timeouts instead of select() IDLE timeouts were originally implemented using select() after checking for the presence of already-buffered data. That allowed timeouts on pipe connetions like IMAP4_stream. However, it seemed possible that SSL data arriving without any IMAP data afterward could cause select() to indicate available application data when there was none, leading to a read() call that would block with no timeout. It was unclear under what conditions this would happen in practice. This change switches to socket timeouts instead of select(), just to be safe. This also reverts IMAP4_stream changes that were made to support IDLE timeouts, since our new implementation only supports socket connections. * imaplib: Idler: rename private state attributes * imaplib: rephrase a comment in example code * docs: imaplib: idle: use Sphinx code-block:: pycon * docs: whatsnew: imaplib: reformat IMAP4.idle entry * imaplib: idle: make doc strings brief Since we generally rely on the reST/html documentation for details, we can keep these doc strings short. This matches the module's existing doc string style and avoids having to sync small changes between two files. * imaplib: Idler: split assert into two statements * imaplib: Idler: move assignment out of try: block * imaplib: Idler: move __exit__() for readability * imaplib: Idler: move __next__() for readability * imaplib: test: make IdleCmdHandler a global class * docs: imaplib: idle: collapse double-spaces * imaplib: warn on use of undocumented 'file' attr * imaplib: revert import reformatting Since we no longer import platform or selectors, the original import statement style can be restored, reducing the footprint of PR #122542. * imaplib: restore original exception msg formatting This reduces the footprint of PR #122542. * docs: imaplib: idle: versionadded:: next * imaplib: move import statement to where it's used This import is only needed if external code tries to use an attribute that it shouldn't be using. Making it a local import reduces module loading time in supported cases. * imaplib test: RuntimeWarning on IMAP4.file access * imaplib: use stacklevel=2 in warnings.warn() * imaplib test: simplify IMAP4.file warning test * imaplib test: pre-idle-continuation response * imaplib test: post-done untagged response * imaplib: downgrade idle-denied exception to error This makes it easier for client code to distinguish a temporary rejection of the IDLE command from a server responding incorrectly to IDLE. * imaplib: simplify check for socket object * imaplib: narrow the scope of IDLE socket timeouts If an IDLE duration or burst() was in use, and an unsolicited response contained a literal string, and crossed a packet boundary, and the subsequent packet was delayed beyond the IDLE feature's time limit, the timeout would leave the incoming protocol stream in a bad state (with the tail of that response appearing where the start of a response is expected). This change moves the IDLE socket timeout to cover only the start of a response, so it can no longer cause that problem. * imaplib: preserve partial reads on exception This ensures that short IDLE durations / burst() intervals won't risk corrupting response lines that span multiple packets. * imaplib: read/readline: save multipart buffer tail For resilience if read() or readline() ever complete with more than one bytes object remaining in the buffer. This is not expected to happen, but it seems wise to be prepared for a future change making it possible. * imaplib: use TimeoutError subclass only if needed * doc: imaplib: elaborate on IDLE response delivery * doc: imaplib: elaborate in note re: IMAP4.response * imaplib: comment on benefit of reading in chunks Our read() implementation designed to support IDLE replaces the one from PR #119514, fixing the same problem it was addressing. The tests that it added are preserved. * imaplib: readline(): treat ConnectionError as EOF --------- Co-authored-by: Gregory P. Smith <[email protected]> Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Guido van Rossum <[email protected]> Co-authored-by: Guido van Rossum <[email protected]> Co-authored-by: Martin Panter <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]>
* pythongh-55454: Add IMAP4 IDLE support to imaplib This extends imaplib with support for the rfc2177 IMAP IDLE command, as requested in python#55454. It allows events to be pushed to a client as they occur, rather than having to continually poll for mailbox changes. The interface is a new idle() method, which returns an iterable context manager. Entering the context starts IDLE mode, during which events (untagged responses) can be retrieved using the iteration protocol. Exiting the context sends DONE to the server, ending IDLE mode. An optional time limit for the IDLE session is supported, for use with servers that impose an inactivity timeout. The context manager also offers a burst() method, designed for programs wishing to process events in batch rather than one at a time. Notable differences from other implementations: - It's an extension to imaplib, rather than a replacement. - It doesn't introduce additional threads. - It doesn't impose new requirements on the use of imaplib's existing methods. - It passes the unit tests in CPython's test/test_imaplib.py module (and adds new ones). - It works on Windows, Linux, and other unix-like systems. - It makes IDLE available on all of imaplib's client variants (including IMAP4_stream). - The interface is pythonic and easy to use. Caveats: - Due to a Windows limitation, the special case of IMAP4_stream running on Windows lacks a duration/timeout feature. (This is the stdin/stdout pipe connection variant; timeouts work fine for socket-based connections, even on Windows.) I have documented it where appropriate. - The file-like imaplib instance attributes are changed from buffered to unbuffered mode. This could potentially break any client code that uses those objects directly without expecting partial reads/writes. However, these attributes are undocumented. As such, I think (and PEP 8 confirms) that they are fair game for changes. https://peps.python.org/pep-0008/#public-and-internal-interfaces Usage examples: python#55454 (comment) Original discussion: https://discuss.python.org/t/gauging-interest-in-my-imap4-idle-implementation-for-imaplib/59272 Earlier requests and suggestions: python#55454 https://mail.python.org/archives/list/[email protected]/thread/C4TVEYL5IBESQQPPS5GBR7WFBXCLQMZ2/ * pythongh-55454: Clarify imaplib idle() docs - Add example idle response tuples, to make the minor difference from other imaplib response tuples more obvious. - Merge the idle context manager's burst() method docs with the IMAP object's idle() method docs, for easier understanding. - Upgrade the Windows note regarding lack of pipe timeouts to a warning. - Rephrase various things for clarity. * docs: words instead of <= Co-authored-by: Peter Bierma <[email protected]> * docs: improve style in an example Co-authored-by: Peter Bierma <[email protected]> * docs: grammatical edit Co-authored-by: Peter Bierma <[email protected]> * docs consistency Co-authored-by: Peter Bierma <[email protected]> * comment -> docstring Co-authored-by: Peter Bierma <[email protected]> * docs: refer to imaplib as "this module" Co-authored-by: Peter Bierma <[email protected]> * imaplib: simplify & clarify idle debug message Co-authored-by: Peter Bierma <[email protected]> * imaplib: elaborate in idle context manager comment * imaplib: re-raise BaseException instead of bare except Co-authored-by: Peter Bierma <[email protected]> * imaplib: convert private doc string to comment * docs: correct mistake in imaplib example This is a correction to 8077f2e, which changed a variable name in only one place and broke the subsequent reference to it, departed from the naming convention used in the rest of the module, and shadowed the type() builtin along the way. * imaplib: simplify example code in doc string This is for consistency with the documentation change in 8077f2e and subsequent correction in 013bbf1. * imaplib: rename _Idler to Idler, update its docs * imaplib: add comment in Idler._pop() Co-authored-by: Peter Bierma <[email protected]> * imaplib: remove unnecessary blank line Co-authored-by: Peter Bierma <[email protected]> * imaplib: comment on use of unbuffered pipes * docs: imaplib: use the reStructuredText :class: role Co-authored-by: Peter Bierma <[email protected]> * Revert "docs: imaplib: use the reStructuredText :class: role" This reverts commit f385e44, because it triggers CI failures in the docs by referencing a class that is (deliberately) undocumented. * docs: imaplib: use the reST :class: role, escaped This is a different approach to f385e44, which was reverted for creating dangling link references. By prefixing the reStructuredText role target with a ! we disable conversion to a link, thereby passing continuous integration checks even though the referenced class is deliberately absent from the documentation. * docs: refer to IMAP4 IDLE instead of just IDLE This clarifies that we are referring to the email protocol, not the editor with the same name. Co-authored-by: Guido van Rossum <[email protected]> * imaplib: IDLE -> IMAP4 IDLE in exception message Co-authored-by: Peter Bierma <[email protected]> * docs: imaplib idle() phrasing and linking tweaks * docs: imaplib: avoid linking to an invalid target This reverts and rephrases part of a3f21cd which created links to a method on a deliberately undocumented class. The links didn't work consistently, and caused sphinx warnings that broke cpython's continuous integration tests. * imaplib: update test after recent exception change This fixes a test that was broken by changing an exception in b01de95 * imaplib: rename idle() dur argument to duration * imaplib: bytes.index() -> bytes.find() This makes it more obvious which statement triggers the branch. * imaplib: remove no-longer-necessary statement Co-authored-by: Martin Panter <[email protected]> * docs: imaplib: concise & valid method links The burst() method is a little tricky to link in restructuredText, due to quirks of its parent class. This syntax allows sphinx to generate working links without generating warnings (which break continuous integration) and without burdening the reader with unimportant namespace qualifications. It makes the reST source ugly, but few people read the reST source, so it's a tolerable tradeoff. * imaplib: note data types present in IDLE responses * docs: imaplib: add comma to reST changes header Co-authored-by: Bénédikt Tran <[email protected]> * imaplib: sync doc strings with reST docs * docs: imaplib: minor Idler clarifications * imaplib: idle: emit (type, [data, ...]) tuples This allows our iterator to emit untagged responses that contain literal strings in the same way that imaplib's existing methods do, while still emitting exactly one whole response per iteration. * imaplib: while/yield instead of yield from iter() * imaplib: idle: use deadline idiom when iterating This simplifies the code, and avoids idle duration drift from time spent processing each iteration. * docs: imaplib: state duration/interval arg types * docs: imaplib: minor rephrasing of a sentence * docs: imaplib: reposition a paragraph This might improve readability, especially when encountering Idler.burst() for the first time. * docs: imaplib: wrap long lines in idle() section * docs: imaplib: note: Idler objects require 'with' * docs: imaplib: say that 29 minutes is 1740 seconds * docs: imaplib: mark a paragraph as a 'tip' * docs: imaplib: rephrase reference to MS Windows * imaplib: end doc string titles with a period * imaplib: idle: socket timeouts instead of select() IDLE timeouts were originally implemented using select() after checking for the presence of already-buffered data. That allowed timeouts on pipe connetions like IMAP4_stream. However, it seemed possible that SSL data arriving without any IMAP data afterward could cause select() to indicate available application data when there was none, leading to a read() call that would block with no timeout. It was unclear under what conditions this would happen in practice. This change switches to socket timeouts instead of select(), just to be safe. This also reverts IMAP4_stream changes that were made to support IDLE timeouts, since our new implementation only supports socket connections. * imaplib: Idler: rename private state attributes * imaplib: rephrase a comment in example code * docs: imaplib: idle: use Sphinx code-block:: pycon * docs: whatsnew: imaplib: reformat IMAP4.idle entry * imaplib: idle: make doc strings brief Since we generally rely on the reST/html documentation for details, we can keep these doc strings short. This matches the module's existing doc string style and avoids having to sync small changes between two files. * imaplib: Idler: split assert into two statements * imaplib: Idler: move assignment out of try: block * imaplib: Idler: move __exit__() for readability * imaplib: Idler: move __next__() for readability * imaplib: test: make IdleCmdHandler a global class * docs: imaplib: idle: collapse double-spaces * imaplib: warn on use of undocumented 'file' attr * imaplib: revert import reformatting Since we no longer import platform or selectors, the original import statement style can be restored, reducing the footprint of PR python#122542. * imaplib: restore original exception msg formatting This reduces the footprint of PR python#122542. * docs: imaplib: idle: versionadded:: next * imaplib: move import statement to where it's used This import is only needed if external code tries to use an attribute that it shouldn't be using. Making it a local import reduces module loading time in supported cases. * imaplib test: RuntimeWarning on IMAP4.file access * imaplib: use stacklevel=2 in warnings.warn() * imaplib test: simplify IMAP4.file warning test * imaplib test: pre-idle-continuation response * imaplib test: post-done untagged response * imaplib: downgrade idle-denied exception to error This makes it easier for client code to distinguish a temporary rejection of the IDLE command from a server responding incorrectly to IDLE. * imaplib: simplify check for socket object * imaplib: narrow the scope of IDLE socket timeouts If an IDLE duration or burst() was in use, and an unsolicited response contained a literal string, and crossed a packet boundary, and the subsequent packet was delayed beyond the IDLE feature's time limit, the timeout would leave the incoming protocol stream in a bad state (with the tail of that response appearing where the start of a response is expected). This change moves the IDLE socket timeout to cover only the start of a response, so it can no longer cause that problem. * imaplib: preserve partial reads on exception This ensures that short IDLE durations / burst() intervals won't risk corrupting response lines that span multiple packets. * imaplib: read/readline: save multipart buffer tail For resilience if read() or readline() ever complete with more than one bytes object remaining in the buffer. This is not expected to happen, but it seems wise to be prepared for a future change making it possible. * imaplib: use TimeoutError subclass only if needed * doc: imaplib: elaborate on IDLE response delivery * doc: imaplib: elaborate in note re: IMAP4.response * imaplib: comment on benefit of reading in chunks Our read() implementation designed to support IDLE replaces the one from PR python#119514, fixing the same problem it was addressing. The tests that it added are preserved. * imaplib: readline(): treat ConnectionError as EOF --------- Co-authored-by: Gregory P. Smith <[email protected]> Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Guido van Rossum <[email protected]> Co-authored-by: Guido van Rossum <[email protected]> Co-authored-by: Martin Panter <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]>
* pythongh-55454: Add IMAP4 IDLE support to imaplib This extends imaplib with support for the rfc2177 IMAP IDLE command, as requested in python#55454. It allows events to be pushed to a client as they occur, rather than having to continually poll for mailbox changes. The interface is a new idle() method, which returns an iterable context manager. Entering the context starts IDLE mode, during which events (untagged responses) can be retrieved using the iteration protocol. Exiting the context sends DONE to the server, ending IDLE mode. An optional time limit for the IDLE session is supported, for use with servers that impose an inactivity timeout. The context manager also offers a burst() method, designed for programs wishing to process events in batch rather than one at a time. Notable differences from other implementations: - It's an extension to imaplib, rather than a replacement. - It doesn't introduce additional threads. - It doesn't impose new requirements on the use of imaplib's existing methods. - It passes the unit tests in CPython's test/test_imaplib.py module (and adds new ones). - It works on Windows, Linux, and other unix-like systems. - It makes IDLE available on all of imaplib's client variants (including IMAP4_stream). - The interface is pythonic and easy to use. Caveats: - Due to a Windows limitation, the special case of IMAP4_stream running on Windows lacks a duration/timeout feature. (This is the stdin/stdout pipe connection variant; timeouts work fine for socket-based connections, even on Windows.) I have documented it where appropriate. - The file-like imaplib instance attributes are changed from buffered to unbuffered mode. This could potentially break any client code that uses those objects directly without expecting partial reads/writes. However, these attributes are undocumented. As such, I think (and PEP 8 confirms) that they are fair game for changes. https://peps.python.org/pep-0008/#public-and-internal-interfaces Usage examples: python#55454 (comment) Original discussion: https://discuss.python.org/t/gauging-interest-in-my-imap4-idle-implementation-for-imaplib/59272 Earlier requests and suggestions: python#55454 https://mail.python.org/archives/list/[email protected]/thread/C4TVEYL5IBESQQPPS5GBR7WFBXCLQMZ2/ * pythongh-55454: Clarify imaplib idle() docs - Add example idle response tuples, to make the minor difference from other imaplib response tuples more obvious. - Merge the idle context manager's burst() method docs with the IMAP object's idle() method docs, for easier understanding. - Upgrade the Windows note regarding lack of pipe timeouts to a warning. - Rephrase various things for clarity. * docs: words instead of <= Co-authored-by: Peter Bierma <[email protected]> * docs: improve style in an example Co-authored-by: Peter Bierma <[email protected]> * docs: grammatical edit Co-authored-by: Peter Bierma <[email protected]> * docs consistency Co-authored-by: Peter Bierma <[email protected]> * comment -> docstring Co-authored-by: Peter Bierma <[email protected]> * docs: refer to imaplib as "this module" Co-authored-by: Peter Bierma <[email protected]> * imaplib: simplify & clarify idle debug message Co-authored-by: Peter Bierma <[email protected]> * imaplib: elaborate in idle context manager comment * imaplib: re-raise BaseException instead of bare except Co-authored-by: Peter Bierma <[email protected]> * imaplib: convert private doc string to comment * docs: correct mistake in imaplib example This is a correction to 8077f2e, which changed a variable name in only one place and broke the subsequent reference to it, departed from the naming convention used in the rest of the module, and shadowed the type() builtin along the way. * imaplib: simplify example code in doc string This is for consistency with the documentation change in 8077f2e and subsequent correction in 013bbf1. * imaplib: rename _Idler to Idler, update its docs * imaplib: add comment in Idler._pop() Co-authored-by: Peter Bierma <[email protected]> * imaplib: remove unnecessary blank line Co-authored-by: Peter Bierma <[email protected]> * imaplib: comment on use of unbuffered pipes * docs: imaplib: use the reStructuredText :class: role Co-authored-by: Peter Bierma <[email protected]> * Revert "docs: imaplib: use the reStructuredText :class: role" This reverts commit f385e44, because it triggers CI failures in the docs by referencing a class that is (deliberately) undocumented. * docs: imaplib: use the reST :class: role, escaped This is a different approach to f385e44, which was reverted for creating dangling link references. By prefixing the reStructuredText role target with a ! we disable conversion to a link, thereby passing continuous integration checks even though the referenced class is deliberately absent from the documentation. * docs: refer to IMAP4 IDLE instead of just IDLE This clarifies that we are referring to the email protocol, not the editor with the same name. Co-authored-by: Guido van Rossum <[email protected]> * imaplib: IDLE -> IMAP4 IDLE in exception message Co-authored-by: Peter Bierma <[email protected]> * docs: imaplib idle() phrasing and linking tweaks * docs: imaplib: avoid linking to an invalid target This reverts and rephrases part of a3f21cd which created links to a method on a deliberately undocumented class. The links didn't work consistently, and caused sphinx warnings that broke cpython's continuous integration tests. * imaplib: update test after recent exception change This fixes a test that was broken by changing an exception in b01de95 * imaplib: rename idle() dur argument to duration * imaplib: bytes.index() -> bytes.find() This makes it more obvious which statement triggers the branch. * imaplib: remove no-longer-necessary statement Co-authored-by: Martin Panter <[email protected]> * docs: imaplib: concise & valid method links The burst() method is a little tricky to link in restructuredText, due to quirks of its parent class. This syntax allows sphinx to generate working links without generating warnings (which break continuous integration) and without burdening the reader with unimportant namespace qualifications. It makes the reST source ugly, but few people read the reST source, so it's a tolerable tradeoff. * imaplib: note data types present in IDLE responses * docs: imaplib: add comma to reST changes header Co-authored-by: Bénédikt Tran <[email protected]> * imaplib: sync doc strings with reST docs * docs: imaplib: minor Idler clarifications * imaplib: idle: emit (type, [data, ...]) tuples This allows our iterator to emit untagged responses that contain literal strings in the same way that imaplib's existing methods do, while still emitting exactly one whole response per iteration. * imaplib: while/yield instead of yield from iter() * imaplib: idle: use deadline idiom when iterating This simplifies the code, and avoids idle duration drift from time spent processing each iteration. * docs: imaplib: state duration/interval arg types * docs: imaplib: minor rephrasing of a sentence * docs: imaplib: reposition a paragraph This might improve readability, especially when encountering Idler.burst() for the first time. * docs: imaplib: wrap long lines in idle() section * docs: imaplib: note: Idler objects require 'with' * docs: imaplib: say that 29 minutes is 1740 seconds * docs: imaplib: mark a paragraph as a 'tip' * docs: imaplib: rephrase reference to MS Windows * imaplib: end doc string titles with a period * imaplib: idle: socket timeouts instead of select() IDLE timeouts were originally implemented using select() after checking for the presence of already-buffered data. That allowed timeouts on pipe connetions like IMAP4_stream. However, it seemed possible that SSL data arriving without any IMAP data afterward could cause select() to indicate available application data when there was none, leading to a read() call that would block with no timeout. It was unclear under what conditions this would happen in practice. This change switches to socket timeouts instead of select(), just to be safe. This also reverts IMAP4_stream changes that were made to support IDLE timeouts, since our new implementation only supports socket connections. * imaplib: Idler: rename private state attributes * imaplib: rephrase a comment in example code * docs: imaplib: idle: use Sphinx code-block:: pycon * docs: whatsnew: imaplib: reformat IMAP4.idle entry * imaplib: idle: make doc strings brief Since we generally rely on the reST/html documentation for details, we can keep these doc strings short. This matches the module's existing doc string style and avoids having to sync small changes between two files. * imaplib: Idler: split assert into two statements * imaplib: Idler: move assignment out of try: block * imaplib: Idler: move __exit__() for readability * imaplib: Idler: move __next__() for readability * imaplib: test: make IdleCmdHandler a global class * docs: imaplib: idle: collapse double-spaces * imaplib: warn on use of undocumented 'file' attr * imaplib: revert import reformatting Since we no longer import platform or selectors, the original import statement style can be restored, reducing the footprint of PR python#122542. * imaplib: restore original exception msg formatting This reduces the footprint of PR python#122542. * docs: imaplib: idle: versionadded:: next * imaplib: move import statement to where it's used This import is only needed if external code tries to use an attribute that it shouldn't be using. Making it a local import reduces module loading time in supported cases. * imaplib test: RuntimeWarning on IMAP4.file access * imaplib: use stacklevel=2 in warnings.warn() * imaplib test: simplify IMAP4.file warning test * imaplib test: pre-idle-continuation response * imaplib test: post-done untagged response * imaplib: downgrade idle-denied exception to error This makes it easier for client code to distinguish a temporary rejection of the IDLE command from a server responding incorrectly to IDLE. * imaplib: simplify check for socket object * imaplib: narrow the scope of IDLE socket timeouts If an IDLE duration or burst() was in use, and an unsolicited response contained a literal string, and crossed a packet boundary, and the subsequent packet was delayed beyond the IDLE feature's time limit, the timeout would leave the incoming protocol stream in a bad state (with the tail of that response appearing where the start of a response is expected). This change moves the IDLE socket timeout to cover only the start of a response, so it can no longer cause that problem. * imaplib: preserve partial reads on exception This ensures that short IDLE durations / burst() intervals won't risk corrupting response lines that span multiple packets. * imaplib: read/readline: save multipart buffer tail For resilience if read() or readline() ever complete with more than one bytes object remaining in the buffer. This is not expected to happen, but it seems wise to be prepared for a future change making it possible. * imaplib: use TimeoutError subclass only if needed * doc: imaplib: elaborate on IDLE response delivery * doc: imaplib: elaborate in note re: IMAP4.response * imaplib: comment on benefit of reading in chunks Our read() implementation designed to support IDLE replaces the one from PR python#119514, fixing the same problem it was addressing. The tests that it added are preserved. * imaplib: readline(): treat ConnectionError as EOF --------- Co-authored-by: Gregory P. Smith <[email protected]> Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Guido van Rossum <[email protected]> Co-authored-by: Guido van Rossum <[email protected]> Co-authored-by: Martin Panter <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]>
GH-130248 is a backport of this pull request to the 3.9 branch. |
The IMAP4 client could consume an arbitrary amount of memory when trying to connent to a malicious server, because it read a "literal" data with a single read(size) call, and BufferedReader.read() allocates the bytes object of the specified size before reading. Now the IMAP4 client reads data by chunks, therefore the amount of used memory is limited by the amount of the data actually been sent by the server.