numpy array captures 'in' statement when it shouldn't (Trac #1433) #2031

thouis · 2012-10-19T21:02:13Z

Original ticket http://projects.scipy.org/numpy/ticket/1433 on 2010-03-16 by trac user graik, assigned to unknown.

The following code used to work perfectly well with older numpy versions:

a = arange(10)
a in (0,None)
FALSE

Now it gives an error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

This comes unexpected. For example my library contains several situations like this:

def calculate( a=None ):
if a in (0,None):
a = zeros( 10 )
...
calculate( a=arange(10) )

So I use the 'in' to quickly check for identity with a couple of non-valid values.

Apparently, the numpy array code now captures the 'in' statement via contains and expects an array on the other side (not even a list will do). This is not logical. While there might be some reason for treating the 'array in array' situation differently, this should not break standard python behavior.

Simple python lists do accept this code::

l = [1,2,3]
l in (None,0)
FALSE

I would expect numpy arrays to do the same (as it did before). At the very least, the new contains should check for non-array arguments and fall back to the standard behavior rather than raising an exception.

Thanks in advance!
Greetings
Raik

thouis · 2012-10-19T21:02:13Z

@pv wrote on 2010-09-05

That has not worked in any Numpy versions:

>>> import numpy as np
>>> np.__version__
'0.9.2'
>>> a = np.arange(10); a in (0, None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Perhaps you mean it used to work in Numeric. However, even there it did not do what you would think:

>>> import Numeric
>>> a = Numeric.arange(10)
>>> a in (0, None)
True

The point why __contains__ is ambiguous is that the equality operation on arrays is defined to return a boolean array of elementwise comparisons, and it is ambiguous if the result should be reduced to a single boolean via all() or via any().

It is not possible to special-case for __contains__ due to the way Python implements it.

thouis · 2012-10-19T21:02:13Z

trac user graik wrote on 2010-09-05

Thanks for having a look at this. Though I am afraid I disagree. Now I haven't checked every version of numpy or Numeric. But we have used this construction in a large python library since many years (http://biskit.sf.net). First with Numeric then with numpy. It could well be that the later Numeric versions were already broken / modified -- we kept using some version 23.x because everything later became increasingly unstable. I also made a big leap in numpy versions from a very early to the latest one.

Anyway, this is non-python behavior. It seems then that numpy's equality operation should be fixed. Python data types have an expected behavior -- if compared to some other (incompatible) data type, they simply return False. This allows many common and elegant short cuts. For example, another very frequent pattern in our library was this (assigning a default value if None is given):

  a = None
  b = a or zeros( 10 )

instead of:

  a = None
  b = zeros( 10 )
  if a is not None:
      b = a

It works perfectly fine with all python data types. But because of numpy's new equal, we had to remove all these constructs. This new ValueError is quite annoying. It should only be raised if we are actually comparing arrays.

Greetings,
Raik

thouis · 2012-10-19T21:02:13Z

@pv wrote on 2010-09-05

Well, this design decision has been in place since the beginning of Numpy, and revisiting it at this point requires reconsidering the reasons why it was put there in the first place. The correct forum for that discussion is the mailing list:

http://mail.scipy.org/mailman/listinfo/numpy-discussion

Here are some related discussions:

The main points to note are:

some_array == 0 must return an array (otherwise, essentially all Numpy-using code will break)
bool(some_array) == some_array.all() cannot be used due to if a != b:
bool(some_array) == some_array.any() cannot be used since it leads e.g. to the issue with __contains__

Since there is no obvious solution, a ValueError was decided to be the best choice.

Note that the behavior of Numeric was not a "feature", which was reconsidered in Numpy. I don't know exactly in which version it was introduced,

thouis · 2012-10-19T21:02:13Z

@pv wrote on 2010-09-05

Ok, the last paragraph was incomplete: """Note that the behavior of Numeric was a 'feature', which was reconsidered in Numpy. I don't know exactly in which version it was introduced, but at least it is present in the 24.X series."""

thouis closed this as completed Oct 19, 2012

jtratner mentioned this issue Apr 27, 2014

Adjust Bool evaluation for empty DataFrames to match PEP8 pandas-dev/pandas#6964

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numpy array captures 'in' statement when it shouldn't (Trac #1433) #2031

numpy array captures 'in' statement when it shouldn't (Trac #1433) #2031

thouis commented Oct 19, 2012

thouis commented Oct 19, 2012

thouis commented Oct 19, 2012

thouis commented Oct 19, 2012

thouis commented Oct 19, 2012

numpy array captures 'in' statement when it shouldn't (Trac #1433) #2031

numpy array captures 'in' statement when it shouldn't (Trac #1433) #2031

Comments

thouis commented Oct 19, 2012

thouis commented Oct 19, 2012

thouis commented Oct 19, 2012

thouis commented Oct 19, 2012

thouis commented Oct 19, 2012