Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add allow_sets-kwarg to is_list_like #23065

Merged
merged 21 commits into from
Oct 18, 2018
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,8 @@ Other Enhancements
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`)
- :class:`Resampler` now is iterable like :class:`GroupBy` (:issue:`15314`).
- :meth:`Series.resample` and :meth:`DataFrame.resample` have gained the :meth:`Resampler.quantile` (:issue:`15023`).
- :meth:`pandas.core.dtypes.is_list_like` has gained a keyword ``allow_sets`` which is ``True`` by default; if ``False``,
all instances of ``set`` will not be considered "list-like" anymore (:issue:`23061`)
- :meth:`Index.to_frame` now supports overriding column name(s) (:issue:`22580`).
- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`).
- Compatibility with Matplotlib 3.0 (:issue:`22790`).
Expand Down
2 changes: 2 additions & 0 deletions pandas/compat/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ def lfilter(*args, **kwargs):
Mapping = collections.abc.Mapping
Sequence = collections.abc.Sequence
Sized = collections.abc.Sized
Set = collections.abc.Set

else:
# Python 2
Expand Down Expand Up @@ -201,6 +202,7 @@ def get_range_parameters(data):
Mapping = collections.Mapping
Sequence = collections.Sequence
Sized = collections.Sized
Set = collections.Set

if PY2:
def iteritems(obj, **kw):
Expand Down
8 changes: 4 additions & 4 deletions pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@
ABCSparseArray, ABCSparseSeries, ABCCategoricalIndex, ABCIndexClass,
ABCDateOffset)
from pandas.core.dtypes.inference import ( # noqa:F401
is_bool, is_integer, is_hashable, is_iterator, is_float,
is_dict_like, is_scalar, is_string_like, is_list_like, is_number,
is_file_like, is_re, is_re_compilable, is_sequence, is_nested_list_like,
is_named_tuple, is_array_like, is_decimal, is_complex, is_interval)
is_bool, is_integer, is_float, is_number, is_decimal, is_complex,
is_re, is_re_compilable, is_dict_like, is_string_like, is_file_like,
is_list_like, is_nested_list_like, is_sequence, is_named_tuple,
is_hashable, is_iterator, is_array_like, is_scalar, is_interval)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat of an artefact of the version with is_ordered_list_like, where I tried to group these methods by similarity (i.e. scalar dtypes, regexes, containers), but I decided to keep it because I think it helps. Can revert that part of course

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, on any change, pls try to do the minimal changeset. This will lessen reviewer burden and make things go faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"yes, please try to do minimal changeset [next time]" or "yes please revert"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine as is for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for now, but generally pls don't change unrelated things.


_POSSIBLY_CAST_DTYPES = {np.dtype(t).name
for t in ['O', 'int8', 'uint8', 'int16', 'uint16',
Expand Down
18 changes: 13 additions & 5 deletions pandas/core/dtypes/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from numbers import Number
from pandas import compat
from pandas.compat import (PY2, string_types, text_type,
string_and_binary_types, re_type)
string_and_binary_types, re_type, Set)
from pandas._libs import lib

is_bool = lib.is_bool
Expand Down Expand Up @@ -247,7 +247,7 @@ def is_re_compilable(obj):
return True


def is_list_like(obj):
def is_list_like(obj, allow_sets=True):
"""
Check if the object is list-like.

Expand All @@ -259,6 +259,10 @@ def is_list_like(obj):
Parameters
----------
obj : The object to check.
allow_sets : boolean, default True
If this parameter is False, sets will not be considered list-like

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versionadded tag

.. versionadded:: 0.24.0

Returns
-------
Expand All @@ -283,11 +287,15 @@ def is_list_like(obj):
False
"""

return (isinstance(obj, compat.Iterable) and
return (isinstance(obj, compat.Iterable)
# we do not count strings/unicode/bytes as list-like
not isinstance(obj, string_and_binary_types) and
and not isinstance(obj, string_and_binary_types)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct, leave the and where it was

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 is clear about this (https://www.python.org/dev/peps/pep-0008/#should-a-line-break-before-or-after-a-binary-operator)

Binary operators (like and is one) should come after a line-break. It's also more readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, changing this is in principle fine, we have been following that PEP8 rule recently (typically we only want such changes on lines that are already touched by the PR, but since you are here already touching the function some lines below, I would say it is fine).

Note that that is a recent change in PEP8, so you will see many places in the code that does it differently.


# exclude zero-dimensional numpy arrays, effectively scalars
not (isinstance(obj, np.ndarray) and obj.ndim == 0))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from adding the kwarg everywhere, this is the only substantial change of this PR.

and not (isinstance(obj, np.ndarray) and obj.ndim == 0)

# exclude sets if allow_sets is False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line before comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: I don't like the blank lines inside an if condition.

But nothing that needs to be changed now.

and not (allow_sets is False and isinstance(obj, Set)))


def is_array_like(obj):
Expand Down
32 changes: 21 additions & 11 deletions pandas/tests/dtypes/test_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,20 +64,30 @@ def __getitem__(self):


@pytest.mark.parametrize(
"ll",
[
[], [1], (1, ), (1, 2), {'a': 1},
{1, 'a'}, Series([1]),
Series([]), Series(['a']).str,
np.array([2])])
def test_is_list_like_passes(ll):
assert inference.is_list_like(ll)
"obj, expected",
list(zip([
[], [1], tuple(), (1, ), (1, 2), {'a': 1}, {1, 'a'}, np.array([2]),
Series([1]), Series([]), Series(['a']).str, Index([]), Index([1]),
DataFrame(), DataFrame([[1]]), iter([1, 2]), (x for x in [1, 2]),
np.ndarray((2,) * 2), np.ndarray((2,) * 3), np.ndarray((2,) * 4)
], [True] * 30))
+ list(zip([1, '2', object(), str, np.array(2)], [False] * 10)))
def test_is_list_like(obj, expected):
assert inference.is_list_like(obj) == expected


@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have 2 tests total to avoid the duplication of the args here (IOW 1 for allow_sets=True and 1 for False).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if my solution is what you had in mind, but I gave it a shot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the earlier version, but I don't think this is what Jeff had in mind. If we want to de-duplicate the arguments, you would need a fixture giving them

@pytest.fixture(params=...)
def maybe_list_like(request):
    return request.param

Each of the params would be a tuple like ([], True), ('2', False), and I guess something like ({}, None) or ({}, 'maybe'}) for set-likes.

Then we would have two tests. In the first we do

obj, expected = ...
if expected:
    expected = True

assert is_list_like(obj) is expected

and in the second

if expected is None:
    expected = False

assert is_list_like(obj, include_sets=False) is expected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah @TomAugspurger suggestion is good here. The issues is we can't list the args twice.

"ll", [1, '2', object(), str, np.array(2)])
def test_is_list_like_fails(ll):
assert not inference.is_list_like(ll)
"obj, expected",
list(zip([
[], [1], tuple(), (1, ), (1, 2), {'a': 1}, np.array([2]),
Series([1]), Series([]), Series(['a']).str, Index([]), Index([1]),
DataFrame(), DataFrame([[1]]), iter([1, 2]), (x for x in [1, 2]),
np.ndarray((2,) * 2), np.ndarray((2,) * 3), np.ndarray((2,) * 4)
], [True] * 30))
+ list(zip([1, '2', object(), str, np.array(2),
{1, 'a'}, frozenset({1, 'a'})], [False] * 10)))
def test_is_list_like_disallow_sets(obj, expected):
assert inference.is_list_like(obj, allow_sets=False) == expected


def test_is_array_like():
Expand Down