Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several improvements to NeuralNetBinaryClassifier #515

Merged
merged 9 commits into from
Sep 27, 2019

Conversation

BenjaminBossan
Copy link
Collaborator

This could break backwards compatibility if someone relies on y_proba to be 1d.

* more fit runs because of flaky output (fixes #514)
* add a classes_ attribute for things like
  CalibratedClassifierCV (fixes #465)
* make y_proba 2-dim, which works better with certain sklearn
  metrics (fixes #442)
* automatically squeezes output if module returns (n,1) shape (fixes
  #502)

This could break backwards compatibility if someone relies on
y_proba to be 1d.
@BenjaminBossan
Copy link
Collaborator Author

@ZaydH Since you worked and had trouble with NeuralNetBinaryClassifier I would like to hear your opinion on this PR.

@@ -196,10 +196,10 @@ def test_predict_predict_proba(self, net, data, threshold):
net.fit(X, y)

y_pred_proba = net.predict_proba(X)
assert y_pred_proba.ndim == 1
assert y_pred_proba.shape[1] == 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
assert y_pred_proba.shape[1] == 2
assert y_pred_proba.shape == (X.shape[0], 2)

# pylint: disable=signature-differs
def check_data(self, X, y):
super().check_data(X, y)
if get_dim(y) != 1:
raise ValueError("The target data should be 1-dimensional.")

def infer(self, x, **fit_params):
y_infer = super().infer(x, **fit_params)
if y_infer.dim() < 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to be really direct:

Suggested change
if y_infer.dim() < 2:
if y_infer.dim() == 1:

CHANGES.md Outdated

### Changed

- Improve numerical stability when using `NLLLoss` in `NeuralNetClassifer` (#491)
- NeuralNetBinaryClassifier.predict_proba now returns a 2-dim array; to access the "old" y_proba, take y_proba[:, 1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add PR number?


def test_with_calibrated_classifier_cv(self, net_fit, data):
from sklearn.calibration import CalibratedClassifierCV
cccv = CalibratedClassifierCV(net_fit, cv=3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the fitted module here?

If we care alot about test run time speed, we can set cv=2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to cv=2. Net has to be initialized for this to work.

@@ -166,7 +166,7 @@ def test_not_fitted_raises(self, net_cls, module_cls, data, method):
"before using this method.")
assert exc.value.args[0] == msg

@flaky(max_runs=3)
@flaky(max_runs=5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is getting pretty flaky indeed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a grid search (yay) to find better parameters. Hopefully the test is less flaky now.

@BenjaminBossan
Copy link
Collaborator Author

@thomasjpfan Pls review again

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

# pylint: disable=signature-differs
def check_data(self, X, y):
super().check_data(X, y)
if get_dim(y) != 1:
raise ValueError("The target data should be 1-dimensional.")

def infer(self, x, **fit_params):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a docstring that documents the automatic squeezing of module outputs would be helpful here (i.e., that it is done and why it is done).

skorch/tests/test_classifier.py Outdated Show resolved Hide resolved
BenjaminBossan and others added 4 commits September 17, 2019 21:11
* add a docstring
* correct error message when output dim > 2
* also works when more than 1 output is returned
@BenjaminBossan
Copy link
Collaborator Author

@ottonemo I changed the requested things, and then some more minor things. Please review again.

@ottonemo ottonemo merged commit 0581bc6 into master Sep 27, 2019
@BenjaminBossan BenjaminBossan deleted the feature/improvements-to-binary-classifier branch October 13, 2019 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants