Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 688: Making the buffer protocol accessible in Python #2549

Merged
merged 18 commits into from
Apr 25, 2022
Merged
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -568,6 +568,7 @@ pep-0684.rst @ericsnowcurrently
pep-0685.rst @brettcannon
pep-0686.rst @methane
pep-0687.rst @encukou
pep-0688.rst @jellezijlstra
# ...
# pep-0754.txt
# ...
Expand Down
252 changes: 252 additions & 0 deletions pep-0688.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
PEP: 688
Title: Making the buffer protocol accessible in Python
Author: Jelle Zijlstra <[email protected]>
Sponsor: Jelle Zijlstra <[email protected]>
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 23-Apr-2022
Python-Version: 3.12


Abstract
========

This PEP proposes a mechanism to inspect in Python whether a type implements
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
the C-level buffer protocol. This allows type checkers to check for
objects that implement the protocol.


Motivation
==========

The CPython C API provides a versatile mechanism for accessing the
underlying memory of an object, the buffer protocol from :pep:`3118`.
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
Functions that accept binary data are usually written to accept any
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
object implementing the buffer protocol. For example, as I write this,
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
there are about 130 functions in CPython using the Argument Clinic
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
``Py_buffer`` type, which accepts the buffer protocol.

Currently, there is no way to inspect in Python whether an object
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
implements the buffer protocol. Relatedly, the static type system
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
does not provide a type annotation to represent the protocol.
This is a `common problem <https://github.com/python/typing/issues/593>`__
when type annotating code that accepts generic buffers.
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved


Rationale
=========

Current options
---------------

There are two current workarounds for annotating buffer types in
the type system, but neither is adequate.

First, the `current workaround <https://github.com/python/typeshed/blob/2a0fc1b582ef84f7a82c0beb39fa617de2539d3d/stdlib/_typeshed/__init__.pyi#L194>`__
for buffer types in typeshed is a type alias
that lists well-known buffer types in the standard library, such as
``bytes``, ``bytearray``, ``memoryview``, and ``array.array``. This
approach works for the standard library, but it does not extend to
third-party buffer types.

Second, the `documentation <https://docs.python.org/3.10/library/typing.html#typing.ByteString>`__
for ``typing.ByteString`` currently states:

This type represents the types ``bytes``, ``bytearray``, and
``memoryview`` of byte sequences.

As a shorthand for this type, ``bytes`` can be used to annotate
arguments of any of the types mentioned above.

Although this sentence has been in the documentation
`since 2015 <https://github.com/python/cpython/commit/2a19d956ab92fc9084a105cc11292cb0438b322f>`__,
the use of ``bytes`` to include these other types is not specified
in any of the typing PEPs. Furthermore, this mechanism has a number of
problems. It does not include all possible buffer types, and it
makes the ``bytes`` type ambiguous in type annotations. After all,
there are many operations that are valid on ``bytes`` objects, but
not on ``memoryview`` objects, and it is perfectly possible for
a function to accept ``bytes`` but not ``memoryview`` objects.
A mypy user
`reports <https://github.com/python/mypy/issues/12643#issuecomment-1105914159>`__
that this shortcut has caused significant problems for the ``psycopg`` project.

Kinds of buffers
----------------

The C buffer protocol supports
`many options <https://docs.python.org/3.10/c-api/buffer.html#buffer-request-types>`__,
affecting strides, contiguity, and support for writing to the buffer. Some of these
options would be useful in the type system. For example, typeshed
currently provides separate type aliases for writable and read-only
buffers.

However, in the C buffer protocol, these options cannot be
queried directly on the type object. The only way to figure out
whether an object supports a writable buffer is to actually
ask for the buffer. For some types, such as ``memoryview``,
whether the buffer is writable depends on the instance:
some instances are read-only and others are not. As such, I propose to
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
support only whether a type implements the buffer protocol at
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
all, not whether it supports more specific options such as
writable buffers.

Specification
=============

types.Buffer
------------

A new class ``types.Buffer`` will be added. It cannot be instantiated or
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
subclassed, but supports the ``__instancecheck__`` and
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
``__subclasscheck__`` hooks. In CPython, these will check for the presence of the
``bf_getbuffer`` slot in the type object:

.. code-block:: pycon
Fidget-Spinner marked this conversation as resolved.
Show resolved Hide resolved

>>> from types import Buffer
>>> isinstance(b"xy", Buffer)
True
>>> issubclass(bytes, Buffer)
True
>>> issubclass(memoryview, Buffer)
True
>>> isinstance("xy", Buffer)
False
>>> issubclass(str, Buffer)
False

The new class can also be used in type annotations:

.. code-block:: python

def need_buffer(b: Buffer) -> memoryview:
return memoryview(b)

need_buffer(b"xy") # ok
need_buffer("xy") # rejected by static type checkers

Usage in stubs
--------------
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved

For static typing purposes, types defined in C extensions usually
require stub files, as described in :pep:`484`. In stub files,
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
``types.Buffer`` may be used as a base class to indicate that a
class implements the buffer protocol.

For example, ``bytes`` may be declared as follows in a stub:

.. code-block:: python

class bytes(types.Buffer, Sequence[int]):
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
def decode(self, ...): ...
...

The ``types.Buffer`` class does not require any special treatment
in type checkers.
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved

Equivalent for older Python versions
------------------------------------

New typing features are usually backported to older Python versions
in the ``typing_extensions`` package. Because the buffer protocol
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
is accessible only in C, ``types.Buffer`` cannot be implemented
in a pure Python package. As a temporary workaround, a
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
``typing_extensions.Buffer`` ABC will be provided on Python versions
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
that do not have ``types.Buffer`` available. For the benefit of
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
static type checkers, ``typing_extensions.Buffer`` can be used as
a base class in stubs to mark types as supporting the buffer protocol.
For runtime uses, the ``ABC.register`` API can be used to register
buffer classes with ``typing_extensions.Buffer``. When
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
``types.Buffer`` is available, ``typing_extensions`` should simply
re-export it.
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved


No special meaning for ``bytes``
--------------------------------

The special case stating that ``bytes`` may be used as a shorthand
for other ``ByteString`` types will be removed from the ``typing``
documentation.
With ``types.Buffer`` available as an alternative, there is no good
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
reason to allow ``bytes`` as a shorthand.
We suggest that type checkers that implement this behavior should deprecate and
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
eventually remove it.


Backwards Compatibility
=======================

As the runtime changes in this PEP only add a new class, there are
no backwards compatibility concerns.

However, the recommendation to remove the special behavior for
``bytes`` in type checkers does have backwards compatibility
impact on users. An `experiment <https://github.com/python/mypy/pull/12661>`__
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
with mypy shows that several major open source projects type
checked with mypy will see new errors if the ``bytes`` promotion
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
is removed. Nevertheless, the change improves overall type safety,
so we believe the migration cost is worth it.


Security Implications
=====================

None.


JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
How to Teach This
=================

We will add notes pointing to ``types.Buffer`` to appropriate places in the
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
documentation, such as `typing.readthedocs.io <https://typing.readthedocs.io/en/latest/>`__
and the `mypy cheat sheet <https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html>`__.
Type checkers may provide additional pointers in their error messages. For example,
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
when they encounter a place where a buffer object is passed to a function that
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
is annotated to only accept ``bytes``, the error message could include a note suggesting
to use ``types.Buffer`` instead.
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved


Reference Implementation
========================

[Link to any existing implementation and details about its state, e.g. proof-of-concept.]
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved


Rejected Ideas
==============

Buffer ABC
----------

An `earlier proposal <https://github.com/python/cpython/issues/71688>`__ suggested
adding a ``collections.abc.Buffer`` ABC to represent buffer objects. This idea
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved
stalled because an ABC with no methods does not fit well into the ``collections.abc``
module. Furthermore, it required manual registration of buffer classes, including
those in the standard library. This PEP's approach of using the ``__instancecheck__``
hook is more natural and does not require explicit registration.
Nevertheless, the ABC proposal has the advantage that it does not require C changes,
and we are proposing to adopt a version of it in the third-party ``typing_extensions``
package for the benefit of users of older versions of Python.
CAM-Gerlach marked this conversation as resolved.
Show resolved Hide resolved


Open Issues
===========

Read-only and writable buffers
------------------------------

To avoid making changes to the buffer protocol itself, this PEP currently
does not provide a way to distinguish between read-only and writable buffers.
That's unfortunate, because some APIs require a writable buffer, and one of
the most common buffer types (``bytes``) is always read-only.
Should we add a new mechanism in C to declare that a type implementing the
buffer protocol is always read-only?
JelleZijlstra marked this conversation as resolved.
Show resolved Hide resolved


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.