QUERY: difference between `namedtuple`s and objects produced by `_make_tuple_bunch`? #22450

deanm0000 · 2025-01-31T16:08:39Z

Describe your issue.

Shapiro returns a namedtuple

scipy/scipy/stats/_morestats.py

Line 1892 in e3c6a05

ShapiroResult = namedtuple('ShapiroResult', ('statistic', 'pvalue'))

whereas wilcox does this

scipy/scipy/stats/_morestats.py

Lines 3452 to 3456 in e3c6a05

    
           def wilcoxon_result_unpacker(res): 
        
               if hasattr(res, 'zstatistic'): 
        
                   return res.statistic, res.pvalue, res.zstatistic 
        
               else: 
        
                   return res.statistic, res.pvalue

which happens after it goes through the process of being made a namedtuple in WilcoxResult.

This issue came up from a polars user because polars treats the namedtuple differently than a regular tuple leading to some confusion.

Is there a reason I'm not seeing for this inconsistency or was it just unintentional? (I'm guessing someone who doesn't use namedtuples did the unpacking)

Reproducing Code Example

NA

Error message

NA

SciPy/NumPy/Python version and system information

e3c6a05 branch

mdhaber · 2025-01-31T16:40:32Z

Like shapiro, wilcoxon also returns a namedtuple-like object.

from scipy.stats import wilcoxon
res = wilcoxon([1, 2, 3])
res
# WilcoxonResult(statistic=0.0, pvalue=0.25)
res.statistic  # 0.0 
res.pvalue  # 0.25
statistic, pvalue = res
statistic  # 0.0 
pvalue  # 0.25

See _make_tuple_bunch. This was needed to add an additional attribute (zstatistic; see gh-2625 and gh-15632) in way that would not break backward compatibility.

The manual unpacking that you showed (wilcoxon_result_unpacker) is separate, and needed during the process of dealing with nan_policy='omit', etc. It does not affect the type of the object returned by the public function.

Please provide a MRE of the problem you're experiencing.

phobson · 2025-01-31T18:29:16Z

Hey @mdhaber, this originates from an issue I brought up in the polars discord where nametuples from scipy.stats and the Wilcoxon "tuple bunches" get unpacked differently.

This is certainly something that I can work around, but here's an MRE that illustrates what I'm seeing via polars.

I generate this example via scipy 1.15.1

import numpy
import polars as pl
import polars.selectors as cs
from scipy import stats

# setup dataframe of fake water quality data
N = 300
rs = numpy.random.RandomState(37)
df= pl.DataFrame({
    "state": rs.choice(["OR", "WA"], size=N),
    "landuse": rs.choice(["RES", "COM"], size=N),
    "pollutant": rs.choice(["Cu", "Pb"], size=N),
    "infl": rs.lognormal( 0.0, 1.25, size=N),
    "effl": rs.lognormal(-0.5, 2.00, size=N),
})

shapiro = (
    df.with_columns(obs=pl.struct("infl", "effl"))
      .group_by(pl.col("state"), pl.col("landuse"), pl.col("pollutant"))
      .agg(cs.by_name("infl", "effl").map_batches(stats.shapiro, returns_scalar=True))
)

And that gives something that looks like this (truncated -- note the struct columns):

┌───────┬─────────┬───────────┬──────────────────────┬───────────────────────┐
│ state ┆ landuse ┆ pollutant ┆ infl                 ┆ effl                  │
│ ---   ┆ ---     ┆ ---       ┆ ---                  ┆ ---                   │
│ str   ┆ str     ┆ str       ┆ struct[2]            ┆ struct[2]             │
╞═══════╪═════════╪═══════════╪══════════════════════╪═══════════════════════╡
│ OR    ┆ RES     ┆ Cu        ┆ {0.817417,0.00003}   ┆ {0.655478,4.5343e-8}  │
│ OR    ┆ RES     ┆ Pb        ┆ {0.62145,3.7383e-8}  ┆ {0.569183,8.0773e-9}  │
└───────┴─────────┴───────────┴──────────────────────┴───────────────────────┘

Compare that with the Wilcoxon test:

wilcoxon = (
    df.with_columns(obs=pl.struct("infl", "effl"))
      .group_by(pl.col("state"), pl.col("landuse"), pl.col("pollutant"))
      .agg(
         stat=pl.col("obs").map_batches(
              lambda g: stats.wilcoxon(g.struct.field("infl"), g.struct.field("effl")),
              returns_scalar=True
         )
      )
)

Here note the list column

┌───────┬─────────┬───────────┬───────────────────┐
│ state ┆ landuse ┆ pollutant ┆ stat              │
│ ---   ┆ ---     ┆ ---       ┆ ---               │
│ str   ┆ str     ┆ str       ┆ list[f64]         │
╞═══════╪═════════╪═══════════╪═══════════════════╡
│ OR    ┆ RES     ┆ Cu        ┆ [319.0, 0.633098] │
│ OR    ┆ RES     ┆ Pb        ┆ [274.0, 0.697783] │
└───────┴─────────┴───────────┴───────────────────┘

That difference might not seem like much, but with namedtuples getting converted to structs via polars, you can do some really neat things:

shapiro.select(pl.all(), pl.col("infl").struct.unnest())

┌───────┬─────────┬───────────┬─────────────────────┬──────────────────────┬───────────┬───────────┐
│ state ┆ landuse ┆ pollutant ┆ infl                ┆ effl                 ┆ statistic ┆ pvalue    │
│ ---   ┆ ---     ┆ ---       ┆ ---                 ┆ ---                  ┆ ---       ┆ ---       │
│ str   ┆ str     ┆ str       ┆ struct[2]           ┆ struct[2]            ┆ f64       ┆ f64       │
╞═══════╪═════════╪═══════════╪═════════════════════╪══════════════════════╪═══════════╪═══════════╡
│ OR    ┆ RES     ┆ Cu        ┆ {0.817417,0.00003}  ┆ {0.655478,4.5343e-8} ┆ 0.817417  ┆ 0.00003   │
│ OR    ┆ RES     ┆ Pb        ┆ {0.62145,3.7383e-8} ┆ {0.569183,8.0773e-9} ┆ 0.62145   ┆ 3.7383e-8 │
└───────┴─────────┴───────────┴─────────────────────┴──────────────────────┴───────────┴───────────┘

or

shapiro.filter(pl.col("infl").struct.field("statistic") > 0.7)

┌───────┬─────────┬───────────┬──────────────────────┬───────────────────────┐
│ state ┆ landuse ┆ pollutant ┆ infl                 ┆ effl                  │
│ ---   ┆ ---     ┆ ---       ┆ ---                  ┆ ---                   │
│ str   ┆ str     ┆ str       ┆ struct[2]            ┆ struct[2]             │
╞═══════╪═════════╪═══════════╪══════════════════════╪═══════════════════════╡
│ OR    ┆ RES     ┆ Cu        ┆ {0.817417,0.00003}   ┆ {0.655478,4.5343e-8}  │
│ WA    ┆ RES     ┆ Cu        ┆ {0.75436,0.000001}   ┆ {0.463176,1.2552e-10} │
│ WA    ┆ COM     ┆ Pb        ┆ {0.708217,1.3313e-7} ┆ {0.539097,5.0447e-10} │
└───────┴─────────┴───────────┴──────────────────────┴───────────────────────┘

When the Wilcoxon results get unpacked as a list, the equivalent queries aren't a tidy/readable, IMO.

To be clear, this isn't a huge deal. I can write a little wrapper for this. But it did catch me off guard that Wilcoxon was the only scipy.stats function that behaved this way (that I've come across so far).

My first & very naive thought would be that if the z-statistic is not generated, it exists in a standard namedtuple as None. I'm sure I missing something as why that's trickier than it sounds.

mdhaber · 2025-01-31T18:59:51Z

When wilcoxon was created, it returned a two-arg tuple without zstatistic. We can't add a third element to a standard tuple (named or not) without a backward-incompatible change.

IIUC (from what you described), polars has special handling for genuine namedtuples that allows the results to look better.

If polars is looking at whether objects are namedtuples, can it look at whether the objects behave like namedtuples instead?
If polars is already looking at whether the objects behave like namedtuples, what attribute/method are our Bunches missing?

_make_tuple_bunch is used in many places, though, so it's puzzling that you have only found it in wilcoxon (if you have tried several stats functions). What happens with ttest_1samp? That also uses _make_tuple_bunch.

nickodell · 2025-01-31T20:03:00Z

If polars is already looking at whether the objects behave like namedtuples, what attribute/method are our Bunches missing?

If I'm understanding the Polars code correctly, it's missing _field_defaults and _replace.

Here's where the feature was added: pola-rs/polars#5057

Here's where this check is defined. The check has changed slightly, but none of the new checks implicate _make_tuple_bunch. https://github.com/pola-rs/polars/blob/b6e7ef8c1f26693346117785ca3e4cd8a52a394a/py-polars/polars/_utils/construction/utils.py#L50

As an experiment, I tried adding _field_defaults and _replace to the class dict in _make_tuple_bunch(), with both set to None, and this causes Polars to detect them as namedtuples.

# Note: This code is adapted from CPython:Lib/collections/__init__.py
def _make_tuple_bunch(typename, field_names, extra_field_names=None,
                      module=None):
    """
    Create a namedtuple-like class with additional attributes.

    This function creates a subclass of tuple that acts like a namedtuple
    and that has additional attributes.

    The additional attributes are listed in `extra_field_names`.  The
    values assigned to these attributes are not part of the tuple.

    The reason this function exists is to allow functions in SciPy
    that currently return a tuple or a namedtuple to returned objects
    that have additional attributes, while maintaining backwards
    compatibility.

    This should only be used to enhance *existing* functions in SciPy.
    New functions are free to create objects as return values without
    having to maintain backwards compatibility with an old tuple or
    namedtuple return value.

    Parameters
    ----------
    typename : str
        The name of the type.
    field_names : list of str
        List of names of the values to be stored in the tuple. These names
        will also be attributes of instances, so the values in the tuple
        can be accessed by indexing or as attributes.  At least one name
        is required.  See the Notes for additional restrictions.
    extra_field_names : list of str, optional
        List of names of values that will be stored as attributes of the
        object.  See the notes for additional restrictions.

    Returns
    -------
    cls : type
        The new class.

    Notes
    -----
    There are restrictions on the names that may be used in `field_names`
    and `extra_field_names`:

    * The names must be unique--no duplicates allowed.
    * The names must be valid Python identifiers, and must not begin with
      an underscore.
    * The names must not be Python keywords (e.g. 'def', 'and', etc., are
      not allowed).

    Examples
    --------
    >>> from scipy._lib._bunch import _make_tuple_bunch

    Create a class that acts like a namedtuple with length 2 (with field
    names `x` and `y`) that will also have the attributes `w` and `beta`:

    >>> Result = _make_tuple_bunch('Result', ['x', 'y'], ['w', 'beta'])

    `Result` is the new class.  We call it with keyword arguments to create
    a new instance with given values.

    >>> result1 = Result(x=1, y=2, w=99, beta=0.5)
    >>> result1
    Result(x=1, y=2, w=99, beta=0.5)

    `result1` acts like a tuple of length 2:

    >>> len(result1)
    2
    >>> result1[:]
    (1, 2)

    The values assigned when the instance was created are available as
    attributes:

    >>> result1.y
    2
    >>> result1.beta
    0.5
    """
    if len(field_names) == 0:
        raise ValueError('field_names must contain at least one name')

    if extra_field_names is None:
        extra_field_names = []
    _validate_names(typename, field_names, extra_field_names)

    typename = _sys.intern(str(typename))
    field_names = tuple(map(_sys.intern, field_names))
    extra_field_names = tuple(map(_sys.intern, extra_field_names))

    all_names = field_names + extra_field_names
    arg_list = ', '.join(field_names)
    full_list = ', '.join(all_names)
    repr_fmt = ''.join(('(',
                        ', '.join(f'{name}=%({name})r' for name in all_names),
                        ')'))
    tuple_new = tuple.__new__
    _dict, _tuple, _zip = dict, tuple, zip

    # Create all the named tuple methods to be added to the class namespace

    s = f"""\
def __new__(_cls, {arg_list}, **extra_fields):
    return _tuple_new(_cls, ({arg_list},))

def __init__(self, {arg_list}, **extra_fields):
    for key in self._extra_fields:
        if key not in extra_fields:
            raise TypeError("missing keyword argument '%s'" % (key,))
    for key, val in extra_fields.items():
        if key not in self._extra_fields:
            raise TypeError("unexpected keyword argument '%s'" % (key,))
        self.__dict__[key] = val

def __setattr__(self, key, val):
    if key in {repr(field_names)}:
        raise AttributeError("can't set attribute %r of class %r"
                             % (key, self.__class__.__name__))
    else:
        self.__dict__[key] = val
"""
    del arg_list
    namespace = {'_tuple_new': tuple_new,
                 '__builtins__': dict(TypeError=TypeError,
                                      AttributeError=AttributeError),
                 '__name__': f'namedtuple_{typename}'}
    exec(s, namespace)
    __new__ = namespace['__new__']
    __new__.__doc__ = f'Create new instance of {typename}({full_list})'
    __init__ = namespace['__init__']
    __init__.__doc__ = f'Instantiate instance of {typename}({full_list})'
    __setattr__ = namespace['__setattr__']

    def __repr__(self):
        'Return a nicely formatted representation string'
        return self.__class__.__name__ + repr_fmt % self._asdict()

    def _asdict(self):
        'Return a new dict which maps field names to their values.'
        out = _dict(_zip(self._fields, self))
        out.update(self.__dict__)
        return out

    def __getnewargs_ex__(self):
        'Return self as a plain tuple.  Used by copy and pickle.'
        return _tuple(self), self.__dict__

    # Modify function metadata to help with introspection and debugging
    for method in (__new__, __repr__, _asdict, __getnewargs_ex__):
        method.__qualname__ = f'{typename}.{method.__name__}'

    # Build-up the class namespace dictionary
    # and use type() to build the result class
    class_namespace = {
        '__doc__': f'{typename}({full_list})',
        '_fields': field_names,
        '__new__': __new__,
        '__init__': __init__,
        '__repr__': __repr__,
        '__setattr__': __setattr__,
        '_asdict': _asdict,
        '_extra_fields': extra_field_names,
        '__getnewargs_ex__': __getnewargs_ex__,
        '_field_defaults': None,
        '_replace': None,
    }
    for index, name in enumerate(field_names):

        def _get(self, index=index):
            return self[index]
        class_namespace[name] = property(_get)
    for name in extra_field_names:

        def _get(self, name=name):
            return self.__dict__[name]
        class_namespace[name] = property(_get)

    result = type(typename, (tuple,), class_namespace)

    # For pickling to work, the __module__ variable needs to be set to the
    # frame where the named tuple is created.  Bypass this step in environments
    # where sys._getframe is not defined (Jython for example) or sys._getframe
    # is not defined for arguments greater than 0 (IronPython), or where the
    # user has specified a particular module.
    if module is None:
        try:
            module = _sys._getframe(1).f_globals.get('__name__', '__main__')
        except (AttributeError, ValueError):
            pass
    if module is not None:
        result.__module__ = module
        __new__.__module__ = module

    return result

>>> WilcoxonResult = _make_tuple_bunch('WilcoxonResult', ['statistic', 'pvalue'])
>>> pl._utils.construction.utils.is_namedtuple(WilcoxonResult)
True

Polars also unpacks this, if you write a wrapper that converts this tuple bunch:

def statswrapper(func):
    def inner(*args, **kwargs):
        res = func(*args, **kwargs)
        return WilcoxonResult(res.statistic, res.pvalue)
    return inner

wilcoxon_wrapped = statswrapper(stats.wilcoxon)

wilcoxon = (
    df.with_columns(obs=pl.struct("infl", "effl"))
      .group_by(pl.col("state"), pl.col("landuse"), pl.col("pollutant"))
      .agg(
         stat=pl.col("obs").map_batches(
#               lambda g: stats.wilcoxon(g.struct.field("infl"), g.struct.field("effl")),
              lambda g: wilcoxon_wrapped(g.struct.field("infl"), g.struct.field("effl")),
              returns_scalar=True
         )
      )
)
wilcoxon

Output:

shape: (8, 4)
┌───────┬─────────┬───────────┬──────────────────┐
│ state ┆ landuse ┆ pollutant ┆ stat             │
│ ---   ┆ ---     ┆ ---       ┆ ---              │
│ str   ┆ str     ┆ str       ┆ struct[2]        │
╞═══════╪═════════╪═══════════╪══════════════════╡
│ WA    ┆ COM     ┆ Cu        ┆ {461.0,0.5309}   │
│ WA    ┆ RES     ┆ Pb        ┆ {201.0,0.160046} │
│ WA    ┆ COM     ┆ Pb        ┆ {408.0,0.984094} │
│ WA    ┆ RES     ┆ Cu        ┆ {269.0,0.144384} │
│ OR    ┆ RES     ┆ Cu        ┆ {319.0,0.633098} │
│ OR    ┆ COM     ┆ Pb        ┆ {331.0,0.4185}   │
│ OR    ┆ COM     ┆ Cu        ┆ {237.0,0.309202} │
│ OR    ┆ RES     ┆ Pb        ┆ {274.0,0.697783} │
└───────┴─────────┴───────────┴──────────────────┘

mdhaber · 2025-01-31T20:07:23Z

I don't have a problem with adding these attributes if it helps, but I don't understand how it's OK for them to be None if polars is going to look for their presence. Does it really just look and not attempt to use them?

deanm0000 · 2025-01-31T20:30:46Z

I would assume it's because neither isinstance(x, namedtuple) nor issubclass(x, namedtuple) work so they went back to some spec that says a namedtuple will have all of those even if polars isn't going to use them.

nickodell · 2025-01-31T20:33:32Z

I don't have a problem with adding these attributes if it helps, but I don't understand how it's OK for them to be None if polars is going to look for their presence.

Oh, I wasn't suggesting that we actually set them to None. I was thinking we'd set them to some sensible value. (Who knows who else is inspecting _field_defaults? :) ) I'm just trying to find the very minimum thing that Polars considers a namedtuple.

Does it really just look and not attempt to use them?

Experimentally, it doesn't seem to.

Also, I searched their codebase for _field_defaults and _replace. The namedtuple check is the only place that uses _field_defaults. The only place in their codebase that uses namedtuple._replace() is an unrelated piece of code that calls dis.Instruction._replace(). Searches: 1 2

By the way, another option, besides pretending to be a namedtuple, would be to pretend to be a dataclass, as those get similar treatment from Polars. Source

mdhaber · 2025-01-31T21:43:12Z

Oh, I wasn't suggesting that we actually set them to None.

Sure, I was just surprised that None worked, and yeah, I guess that's because it's not actually being used.

Well, I wouldn't mind it if these looked more like either dataclasses or namedtuples. Hopefully this would only take a short, non-invasive PR, in which case I'd be happy to review it.

mdhaber · 2025-02-09T04:24:47Z

Closed by gh-22494.

deanm0000 added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Jan 31, 2025

j-bowhay added the scipy.stats label Jan 31, 2025

mdhaber changed the title ~~BUG: Inconsistent return type between shapiro and wilcoxon~~ QUERY: different details in the implementations of shapiro and wilcoxon? Jan 31, 2025

mdhaber changed the title ~~QUERY: different details in the implementations of shapiro and wilcoxon?~~ QUERY: difference between nametuples and objects produced by _make_tuple_bunch? Jan 31, 2025

mdhaber changed the title ~~QUERY: difference between nametuples and objects produced by _make_tuple_bunch?~~ QUERY: difference between namedtuples and objects produced by _make_tuple_bunch? Jan 31, 2025

mdhaber removed the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Jan 31, 2025

lucascolley added the query A question or suggestion that requires further information label Feb 1, 2025

nickodell mentioned this issue Feb 8, 2025

ENH: _lib._make_tuple_bunch: pretend to be namedtuple even more #22494

Merged

mdhaber closed this as completed Feb 9, 2025

lucascolley added this to the 1.16.0 milestone Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QUERY: difference between `namedtuple`s and objects produced by `_make_tuple_bunch`? #22450

QUERY: difference between `namedtuple`s and objects produced by `_make_tuple_bunch`? #22450

deanm0000 commented Jan 31, 2025

mdhaber commented Jan 31, 2025 •

edited

Loading

phobson commented Jan 31, 2025 •

edited

Loading

mdhaber commented Jan 31, 2025 •

edited

Loading

nickodell commented Jan 31, 2025

mdhaber commented Jan 31, 2025

deanm0000 commented Jan 31, 2025

nickodell commented Jan 31, 2025

mdhaber commented Jan 31, 2025

mdhaber commented Feb 9, 2025

QUERY: difference between namedtuples and objects produced by _make_tuple_bunch? #22450

QUERY: difference between namedtuples and objects produced by _make_tuple_bunch? #22450

Comments

deanm0000 commented Jan 31, 2025

Describe your issue.

Reproducing Code Example

Error message

SciPy/NumPy/Python version and system information

mdhaber commented Jan 31, 2025 • edited Loading

phobson commented Jan 31, 2025 • edited Loading

mdhaber commented Jan 31, 2025 • edited Loading

nickodell commented Jan 31, 2025

mdhaber commented Jan 31, 2025

deanm0000 commented Jan 31, 2025

nickodell commented Jan 31, 2025

mdhaber commented Jan 31, 2025

mdhaber commented Feb 9, 2025

QUERY: difference between `namedtuple`s and objects produced by `_make_tuple_bunch`? #22450

QUERY: difference between `namedtuple`s and objects produced by `_make_tuple_bunch`? #22450

mdhaber commented Jan 31, 2025 •

edited

Loading

phobson commented Jan 31, 2025 •

edited

Loading

mdhaber commented Jan 31, 2025 •

edited

Loading