Returning an empty subset as the best subset for feature selection #110

ahcantao · 2018-05-09T22:35:53Z

PySwarms version: 0.1.9
Python version: 3.5.4
Operating System: windows 10

Description

The code is returning an empty subset of attributes as the best subset selection. It is even returning the cost of the subset with only zeros. Even with a minor change on the exemple code it is possible to replicate this output.

There is a checking condition on the code to avoid empty subsets but it still outputs empty as the best subset.
if np.count_nonzero(m) == 0:
#if the particle subset is only zeros, get the original set of attributes
X_subset = X

Describe what you were trying to get done.
Return a non-empty subset of attributes (a binary list containing at least one element 1)

What I Did

I ran the "feature_subset_selection" notebook (https://github.com/ljvmiranda921/pyswarms/blob/master/examples/feature_subset_selection.ipynb) up to the # Perform optimization point. It all runs ok, but when I do a minor change, just on the number of iterations from 1000 to 10, it most of the times return an empty subset as the best subset.

The only line changed was the following:

cost, pos = optimizer.optimize(f, print_step=1, iters=10, verbose=2)
print(cost,pos)

and the output was:

INFO:pyswarms.discrete.binary:Iteration 1/10, cost: 0.2696
INFO:pyswarms.discrete.binary:Iteration 2/10, cost: 0.2696
INFO:pyswarms.discrete.binary:Iteration 3/10, cost: 0.2696
INFO:pyswarms.discrete.binary:Iteration 4/10, cost: 0.2696
INFO:pyswarms.discrete.binary:Iteration 5/10, cost: 0.2696
INFO:pyswarms.discrete.binary:Iteration 6/10, cost: 0.2696
INFO:pyswarms.discrete.binary:Iteration 7/10, cost: 0.2376
INFO:pyswarms.discrete.binary:Iteration 8/10, cost: 0.2376
INFO:pyswarms.discrete.binary:Iteration 9/10, cost: 0.2376
INFO:pyswarms.discrete.binary:Iteration 10/10, cost: 0.2376
INFO:pyswarms.discrete.binary:================================
Optimization finished!
Final cost: 0.2376
Best value: [ 0.000000 0.000000 0.000000 ...]

0.2376 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

The text was updated successfully, but these errors were encountered:

ljvmiranda921 · 2018-05-11T01:35:34Z

Hi @ahcantao , sorry for the late reply.

Let me check on this real quick this weekend. Thanks for reporting!

ahcantao · 2018-05-16T05:58:26Z

@ljvmiranda921 Check if it is possible to avoid the creation of a subset containing only zeros. If this is possible, as a future update, you can add an extra parameter asking for the minimum quantity of 1's in the subset. Ex: minOnes = 3. Will return a subset containing 3 or more attributes out of the full set ([1 1 1 0 0 0 0 0 0 0].

ljvmiranda921 · 2018-05-21T11:55:31Z

Thanks for your input, I'm currently in the process of updating PySwarms' backend (in development branch, not in master). I'd probably incorporate your idea of seeing a minimum number of attributes from the full set. Let's see where it goes.

ljvmiranda921 · 2018-06-07T06:36:11Z

Note to self: make another PR to test this. This should be solved using the new backend implementation but further testing is needed.

Add pyswarms.backend module This commit adds the `backend` module for PySwarms. It enables users to define their own optimization loop using primitives provided by this module (white-box approach). In addition, this commit also updates the existing implementations of GlobalBestPSO, LocalBestPSO, and BinaryPSO to use these backend primitives. Some major updates: 1. Topologies are introduced to segregate computation of best_cost/pos and velocity or position updates. 2. A Swarms class is implemented to serve as a DataClass for data storage. 3. SwarmBase is now called SwarmOptimizer (more semantically meaningful) 4. The user can now set his/her initial positions. 5. Remove py27 compatibility Additional tasks: 1. Documentation and new notebook examples 2. Check feature selection example (#110 ) Signed-off-by: Lester James V. Miranda <[email protected]>

ljvmiranda921 · 2018-07-05T14:23:27Z

This should be solved by #145

bhaskatripathi · 2019-05-09T16:46:37Z

I am still facing the error for best subset for feature selection.

Perform optimization

cost, pos = optimizer.optimize(f,iters=700, fast=False)
The optimizer function returns the following output for pos-

I have 30 features and 903 rows
pos shows 903 returns an n-dimensional array of (903,30).
X_selected_features = X[:,pos==1] shows an "IndexError: too many indices for array" since it expects a one dimensional arrary something like this [0 0 0 0 1 1 0] instead, I am getting -
array([[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
...,
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 1, 1, 1]])

ljvmiranda921 mentioned this issue May 21, 2018

Add backend module #115

Merged

ljvmiranda921 mentioned this issue Jun 7, 2018

Update documentation in Development Branch #121

Closed

3 tasks

ljvmiranda921 self-assigned this Jun 10, 2018

ljvmiranda921 added the bug Bugs, bug fixes, etc. label Jun 12, 2018

ljvmiranda921 added the v0.3.0 label Jun 14, 2018

ljvmiranda921 closed this as completed Jul 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Returning an empty subset as the best subset for feature selection #110

Returning an empty subset as the best subset for feature selection #110

ahcantao commented May 9, 2018

ljvmiranda921 commented May 11, 2018

ahcantao commented May 16, 2018

ljvmiranda921 commented May 21, 2018

ljvmiranda921 commented Jun 7, 2018

ljvmiranda921 commented Jul 5, 2018

bhaskatripathi commented May 9, 2019

Returning an empty subset as the best subset for feature selection #110

Returning an empty subset as the best subset for feature selection #110

Comments

ahcantao commented May 9, 2018

Description

What I Did

ljvmiranda921 commented May 11, 2018

ahcantao commented May 16, 2018

ljvmiranda921 commented May 21, 2018

ljvmiranda921 commented Jun 7, 2018

ljvmiranda921 commented Jul 5, 2018

bhaskatripathi commented May 9, 2019

Perform optimization