-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QUERY: difference between namedtuple
s and objects produced by _make_tuple_bunch
?
#22450
Comments
Like from scipy.stats import wilcoxon
res = wilcoxon([1, 2, 3])
res
# WilcoxonResult(statistic=0.0, pvalue=0.25)
res.statistic # 0.0
res.pvalue # 0.25
statistic, pvalue = res
statistic # 0.0
pvalue # 0.25 See The manual unpacking that you showed ( Please provide a MRE of the problem you're experiencing. |
nametuple
s and objects produced by _make_tuple_bunch
?
nametuple
s and objects produced by _make_tuple_bunch
?namedtuple
s and objects produced by _make_tuple_bunch
?
Hey @mdhaber, this originates from an issue I brought up in the polars discord where nametuples from This is certainly something that I can work around, but here's an MRE that illustrates what I'm seeing via polars. I generate this example via scipy 1.15.1 import numpy
import polars as pl
import polars.selectors as cs
from scipy import stats
# setup dataframe of fake water quality data
N = 300
rs = numpy.random.RandomState(37)
df= pl.DataFrame({
"state": rs.choice(["OR", "WA"], size=N),
"landuse": rs.choice(["RES", "COM"], size=N),
"pollutant": rs.choice(["Cu", "Pb"], size=N),
"infl": rs.lognormal( 0.0, 1.25, size=N),
"effl": rs.lognormal(-0.5, 2.00, size=N),
})
shapiro = (
df.with_columns(obs=pl.struct("infl", "effl"))
.group_by(pl.col("state"), pl.col("landuse"), pl.col("pollutant"))
.agg(cs.by_name("infl", "effl").map_batches(stats.shapiro, returns_scalar=True))
) And that gives something that looks like this (truncated -- note the
Compare that with the Wilcoxon test: wilcoxon = (
df.with_columns(obs=pl.struct("infl", "effl"))
.group_by(pl.col("state"), pl.col("landuse"), pl.col("pollutant"))
.agg(
stat=pl.col("obs").map_batches(
lambda g: stats.wilcoxon(g.struct.field("infl"), g.struct.field("effl")),
returns_scalar=True
)
)
) Here note the
That difference might not seem like much, but with namedtuples getting converted to structs via polars, you can do some really neat things: shapiro.select(pl.all(), pl.col("infl").struct.unnest())
or shapiro.filter(pl.col("infl").struct.field("statistic") > 0.7)
When the Wilcoxon results get unpacked as a list, the equivalent queries aren't a tidy/readable, IMO. To be clear, this isn't a huge deal. I can write a little wrapper for this. But it did catch me off guard that Wilcoxon was the only My first & very naive thought would be that if the z-statistic is not generated, it exists in a standard |
When IIUC (from what you described), polars has special handling for genuine namedtuples that allows the results to look better.
|
If I'm understanding the Polars code correctly, it's missing Here's where the feature was added: pola-rs/polars#5057 Here's where this check is defined. The check has changed slightly, but none of the new checks implicate As an experiment, I tried adding # Note: This code is adapted from CPython:Lib/collections/__init__.py
def _make_tuple_bunch(typename, field_names, extra_field_names=None,
module=None):
"""
Create a namedtuple-like class with additional attributes.
This function creates a subclass of tuple that acts like a namedtuple
and that has additional attributes.
The additional attributes are listed in `extra_field_names`. The
values assigned to these attributes are not part of the tuple.
The reason this function exists is to allow functions in SciPy
that currently return a tuple or a namedtuple to returned objects
that have additional attributes, while maintaining backwards
compatibility.
This should only be used to enhance *existing* functions in SciPy.
New functions are free to create objects as return values without
having to maintain backwards compatibility with an old tuple or
namedtuple return value.
Parameters
----------
typename : str
The name of the type.
field_names : list of str
List of names of the values to be stored in the tuple. These names
will also be attributes of instances, so the values in the tuple
can be accessed by indexing or as attributes. At least one name
is required. See the Notes for additional restrictions.
extra_field_names : list of str, optional
List of names of values that will be stored as attributes of the
object. See the notes for additional restrictions.
Returns
-------
cls : type
The new class.
Notes
-----
There are restrictions on the names that may be used in `field_names`
and `extra_field_names`:
* The names must be unique--no duplicates allowed.
* The names must be valid Python identifiers, and must not begin with
an underscore.
* The names must not be Python keywords (e.g. 'def', 'and', etc., are
not allowed).
Examples
--------
>>> from scipy._lib._bunch import _make_tuple_bunch
Create a class that acts like a namedtuple with length 2 (with field
names `x` and `y`) that will also have the attributes `w` and `beta`:
>>> Result = _make_tuple_bunch('Result', ['x', 'y'], ['w', 'beta'])
`Result` is the new class. We call it with keyword arguments to create
a new instance with given values.
>>> result1 = Result(x=1, y=2, w=99, beta=0.5)
>>> result1
Result(x=1, y=2, w=99, beta=0.5)
`result1` acts like a tuple of length 2:
>>> len(result1)
2
>>> result1[:]
(1, 2)
The values assigned when the instance was created are available as
attributes:
>>> result1.y
2
>>> result1.beta
0.5
"""
if len(field_names) == 0:
raise ValueError('field_names must contain at least one name')
if extra_field_names is None:
extra_field_names = []
_validate_names(typename, field_names, extra_field_names)
typename = _sys.intern(str(typename))
field_names = tuple(map(_sys.intern, field_names))
extra_field_names = tuple(map(_sys.intern, extra_field_names))
all_names = field_names + extra_field_names
arg_list = ', '.join(field_names)
full_list = ', '.join(all_names)
repr_fmt = ''.join(('(',
', '.join(f'{name}=%({name})r' for name in all_names),
')'))
tuple_new = tuple.__new__
_dict, _tuple, _zip = dict, tuple, zip
# Create all the named tuple methods to be added to the class namespace
s = f"""\
def __new__(_cls, {arg_list}, **extra_fields):
return _tuple_new(_cls, ({arg_list},))
def __init__(self, {arg_list}, **extra_fields):
for key in self._extra_fields:
if key not in extra_fields:
raise TypeError("missing keyword argument '%s'" % (key,))
for key, val in extra_fields.items():
if key not in self._extra_fields:
raise TypeError("unexpected keyword argument '%s'" % (key,))
self.__dict__[key] = val
def __setattr__(self, key, val):
if key in {repr(field_names)}:
raise AttributeError("can't set attribute %r of class %r"
% (key, self.__class__.__name__))
else:
self.__dict__[key] = val
"""
del arg_list
namespace = {'_tuple_new': tuple_new,
'__builtins__': dict(TypeError=TypeError,
AttributeError=AttributeError),
'__name__': f'namedtuple_{typename}'}
exec(s, namespace)
__new__ = namespace['__new__']
__new__.__doc__ = f'Create new instance of {typename}({full_list})'
__init__ = namespace['__init__']
__init__.__doc__ = f'Instantiate instance of {typename}({full_list})'
__setattr__ = namespace['__setattr__']
def __repr__(self):
'Return a nicely formatted representation string'
return self.__class__.__name__ + repr_fmt % self._asdict()
def _asdict(self):
'Return a new dict which maps field names to their values.'
out = _dict(_zip(self._fields, self))
out.update(self.__dict__)
return out
def __getnewargs_ex__(self):
'Return self as a plain tuple. Used by copy and pickle.'
return _tuple(self), self.__dict__
# Modify function metadata to help with introspection and debugging
for method in (__new__, __repr__, _asdict, __getnewargs_ex__):
method.__qualname__ = f'{typename}.{method.__name__}'
# Build-up the class namespace dictionary
# and use type() to build the result class
class_namespace = {
'__doc__': f'{typename}({full_list})',
'_fields': field_names,
'__new__': __new__,
'__init__': __init__,
'__repr__': __repr__,
'__setattr__': __setattr__,
'_asdict': _asdict,
'_extra_fields': extra_field_names,
'__getnewargs_ex__': __getnewargs_ex__,
'_field_defaults': None,
'_replace': None,
}
for index, name in enumerate(field_names):
def _get(self, index=index):
return self[index]
class_namespace[name] = property(_get)
for name in extra_field_names:
def _get(self, name=name):
return self.__dict__[name]
class_namespace[name] = property(_get)
result = type(typename, (tuple,), class_namespace)
# For pickling to work, the __module__ variable needs to be set to the
# frame where the named tuple is created. Bypass this step in environments
# where sys._getframe is not defined (Jython for example) or sys._getframe
# is not defined for arguments greater than 0 (IronPython), or where the
# user has specified a particular module.
if module is None:
try:
module = _sys._getframe(1).f_globals.get('__name__', '__main__')
except (AttributeError, ValueError):
pass
if module is not None:
result.__module__ = module
__new__.__module__ = module
return result
Polars also unpacks this, if you write a wrapper that converts this tuple bunch:
Output:
|
I don't have a problem with adding these attributes if it helps, but I don't understand how it's OK for them to be None if |
I would assume it's because neither |
Oh, I wasn't suggesting that we actually set them to None. I was thinking we'd set them to some sensible value. (Who knows who else is inspecting
Experimentally, it doesn't seem to. Also, I searched their codebase for By the way, another option, besides pretending to be a namedtuple, would be to pretend to be a dataclass, as those get similar treatment from Polars. Source |
Sure, I was just surprised that Well, I wouldn't mind it if these looked more like either dataclasses or namedtuples. Hopefully this would only take a short, non-invasive PR, in which case I'd be happy to review it. |
Closed by gh-22494. |
Describe your issue.
Shapiro returns a namedtuple
scipy/scipy/stats/_morestats.py
Line 1892 in e3c6a05
whereas wilcox does this
scipy/scipy/stats/_morestats.py
Lines 3452 to 3456 in e3c6a05
which happens after it goes through the process of being made a namedtuple in WilcoxResult.
This issue came up from a polars user because polars treats the namedtuple differently than a regular tuple leading to some confusion.
Is there a reason I'm not seeing for this inconsistency or was it just unintentional? (I'm guessing someone who doesn't use namedtuples did the unpacking)
Reproducing Code Example
NA
Error message
SciPy/NumPy/Python version and system information
The text was updated successfully, but these errors were encountered: