Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning an empty subset as the best subset for feature selection #110

Closed
ahcantao opened this issue May 9, 2018 · 6 comments
Closed
Assignees
Labels
bug Bugs, bug fixes, etc.

Comments

@ahcantao
Copy link

ahcantao commented May 9, 2018

  • PySwarms version: 0.1.9
  • Python version: 3.5.4
  • Operating System: windows 10

Description

The code is returning an empty subset of attributes as the best subset selection. It is even returning the cost of the subset with only zeros. Even with a minor change on the exemple code it is possible to replicate this output.

There is a checking condition on the code to avoid empty subsets but it still outputs empty as the best subset.
if np.count_nonzero(m) == 0:
#if the particle subset is only zeros, get the original set of attributes
X_subset = X

Describe what you were trying to get done.
Return a non-empty subset of attributes (a binary list containing at least one element 1)

What I Did

I ran the "feature_subset_selection" notebook (https://github.com/ljvmiranda921/pyswarms/blob/master/examples/feature_subset_selection.ipynb) up to the # Perform optimization point. It all runs ok, but when I do a minor change, just on the number of iterations from 1000 to 10, it most of the times return an empty subset as the best subset.

The only line changed was the following:

cost, pos = optimizer.optimize(f, print_step=1, iters=10, verbose=2)
print(cost,pos)

and the output was:

INFO:pyswarms.discrete.binary:Iteration 1/10, cost: 0.2696
INFO:pyswarms.discrete.binary:Iteration 2/10, cost: 0.2696
INFO:pyswarms.discrete.binary:Iteration 3/10, cost: 0.2696
INFO:pyswarms.discrete.binary:Iteration 4/10, cost: 0.2696
INFO:pyswarms.discrete.binary:Iteration 5/10, cost: 0.2696
INFO:pyswarms.discrete.binary:Iteration 6/10, cost: 0.2696
INFO:pyswarms.discrete.binary:Iteration 7/10, cost: 0.2376
INFO:pyswarms.discrete.binary:Iteration 8/10, cost: 0.2376
INFO:pyswarms.discrete.binary:Iteration 9/10, cost: 0.2376
INFO:pyswarms.discrete.binary:Iteration 10/10, cost: 0.2376
INFO:pyswarms.discrete.binary:================================
Optimization finished!
Final cost: 0.2376
Best value: [ 0.000000 0.000000 0.000000 ...]

0.2376 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
@ljvmiranda921
Copy link
Owner

Hi @ahcantao , sorry for the late reply.

Let me check on this real quick this weekend. Thanks for reporting!

@ahcantao
Copy link
Author

@ljvmiranda921 Check if it is possible to avoid the creation of a subset containing only zeros. If this is possible, as a future update, you can add an extra parameter asking for the minimum quantity of 1's in the subset. Ex: minOnes = 3. Will return a subset containing 3 or more attributes out of the full set ([1 1 1 0 0 0 0 0 0 0].

@ljvmiranda921
Copy link
Owner

Thanks for your input, I'm currently in the process of updating PySwarms' backend (in development branch, not in master). I'd probably incorporate your idea of seeing a minimum number of attributes from the full set. Let's see where it goes.

@ljvmiranda921
Copy link
Owner

Note to self: make another PR to test this. This should be solved using the new backend implementation but further testing is needed.

ljvmiranda921 pushed a commit that referenced this issue Jun 7, 2018
Add pyswarms.backend module

This commit adds the `backend` module for PySwarms. It enables users to define
their own optimization loop using primitives provided by this module
(white-box approach). In addition, this commit also updates the existing
implementations of GlobalBestPSO, LocalBestPSO, and BinaryPSO to use
these backend primitives.

Some major updates:
1. Topologies are introduced to segregate computation of best_cost/pos and velocity
or position updates.
2. A Swarms class is implemented to serve as a DataClass for data storage.
3. SwarmBase is now called SwarmOptimizer (more semantically meaningful)
4. The user can now set his/her initial positions.
5. Remove py27 compatibility

Additional tasks:
1. Documentation and new notebook examples
2. Check feature selection example (#110 )

Signed-off-by: Lester James V. Miranda <[email protected]>
@ljvmiranda921 ljvmiranda921 self-assigned this Jun 10, 2018
@ljvmiranda921 ljvmiranda921 added the bug Bugs, bug fixes, etc. label Jun 12, 2018
@ljvmiranda921
Copy link
Owner

This should be solved by #145

@bhaskatripathi
Copy link

I am still facing the error for best subset for feature selection.

Perform optimization

cost, pos = optimizer.optimize(f,iters=700, fast=False)
The optimizer function returns the following output for pos-

  1. I have 30 features and 903 rows
  2. pos shows 903 returns an n-dimensional array of (903,30).
  3. X_selected_features = X[:,pos==1] shows an "IndexError: too many indices for array" since it expects a one dimensional arrary something like this [0 0 0 0 1 1 0] instead, I am getting -
    array([[1, 1, 1, ..., 1, 1, 1],
    [1, 1, 1, ..., 1, 1, 1],
    [1, 1, 1, ..., 1, 1, 1],
    ...,
    [1, 1, 1, ..., 1, 1, 1],
    [1, 1, 1, ..., 1, 1, 1],
    [1, 1, 1, ..., 1, 1, 1]])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs, bug fixes, etc.
Projects
None yet
Development

No branches or pull requests

3 participants