MAINT, BUG, TST: incremental API cleaning #406

stsievert · 2018-10-15T04:20:03Z

This PR

resolves a bug with IncrementalSearchCV where it would be stuck in an infinite loop when decay_rate=0.
defines patience to be the number of calls to partial_fit
implement a test for tol
add to the patience test

This PR depends on #404. A clean diff can be found at stsievert#2.

and change it in the docs too

* Rename history_results_ => history_ * Provide complete model history, and make it public (otherwise boilerplate needed to formulate model_history_ from history_, looping over items in history and putting in dict, {model_id: hist})

This mirrors scikit-learn's cv_results_, with a one important distinction: this implementation only test on 1 training set. This means that there's a `test_score` key, not `mean_test_score`, or `test_score0`.

Before, BaseIncrementalSearchCV assumed _additional_calls returned one model and returned that to the user. Now, BaseIncrementalSearchCV chooses the model with the highest score returned by _additional_calls. This matters if desired to do a random search, or if `max_iter` is hit.

* MAINT: cleaner separation with _adapt and _stop_on_plateau functions (separates complex adaptive algorithm and stopping on plateau, and allows for overwriting _adapt for other adaptive algorithms that want to stop on plateau) * TST: implement tests for patience and tolerance parameters * MAINT: define "patience" to be the number of partial_fit calls, not the number of score calls

TomAugspurger · 2018-10-16T18:57:53Z

Merged master.

TomAugspurger · 2018-10-16T19:26:41Z

How should we proceed here? I'd like to do a release sooner rather than later, but I'm not sure how much time we'll have to review (let alone work) on this PR in the short-term.

Personally, I think that if we do decide to change the default search strategy, we'll be able to do it with a deprecation cycle (if we deem that necessary). So my main concern with releasing master as is, is if the default in master is "bad" and turns people off this estimator. I'm not especially worried about that though, so my vote would be to release soon, and update this after the relesae.

stsievert · 2018-10-16T19:38:02Z

(if we deem that necessary).

There aren't many user-facing changes in this PR. I think it updating this could be a point release. It improves on the implementation (decay_rate=0 is allowed and defines patience to be what the user expects (but they're currently the same thing with default params).

The largest user facing issue is with the bug this PR solves (it removes an infinite loop if decay_rate=0).

TomAugspurger · 2018-10-16T19:39:48Z

I think that infinite loop was fixed in #373

stsievert · 2018-10-16T19:46:10Z

I think that infinite loop was fixed in #373

I don't think so. It's with decay_rate specifically, and involves the property that for all x > 0, 1 / x**0 == 1.

TomAugspurger · 2018-10-16T19:58:42Z

Gotcha. I'll try splitting that bug fix out.

…

On Tue, Oct 16, 2018 at 2:46 PM Scott Sievert ***@***.***> wrote: I think that infinite loop was fixed in #373 <#373> I don't think so. It's with decay_rate specifically, and involves the property that for all x > 0, 1 / x**0 == 1. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#406 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIn_NtW4dfu3dE7cJgf2670s-rj2Aks5uljeDgaJpZM4XbkCU> .

TomAugspurger · 2018-10-17T10:57:28Z

I don't think this PR fixes the decay_rate=0 bug. AFAICT, test_search_basic isn't really being run now, since the tests exists as soon as the test helper _test_search_basic is called. I believe you need to await it.

diff --git a/tests/model_selection/test_incremental.py b/tests/model_selection/test_incremental.py
index 5d828b7..c52b06a 100644
--- a/tests/model_selection/test_incremental.py
+++ b/tests/model_selection/test_incremental.py
@@ -196,9 +196,10 @@ def test_explicit(c, s, a, b):
 @gen_cluster(client=True)
 def test_search_basic(c, s, a, b):
     for decay_rate in {0, 1}:
-        _test_search_basic(decay_rate, c, s, a, b)
+        yield _test_search_basic(decay_rate, c, s, a, b)
 
 
+@gen.coroutine
 def _test_search_basic(decay_rate, c, s, a, b):
     X, y = make_classification(n_samples=1000, n_features=5, chunks=(100, 5))
     model = SGDClassifier(tol=1e-3, loss="log", penalty="elasticnet")

TomAugspurger · 2018-10-17T14:11:50Z

Alrighty, I had a look and I think these changes look good. I like thinking about patience in terms of the number of partial_fit_calls.

Planning to merge this evening.

TomAugspurger · 2018-10-19T15:58:20Z

Thanks @stsievert!

stsievert added 8 commits October 13, 2018 13:02

API: rename IncrementalSearch => IncrementalSearchCV

8005b75

and change it in the docs too

API: rename history_results_, and format differently

0335087

* Rename history_results_ => history_ * Provide complete model history, and make it public (otherwise boilerplate needed to formulate model_history_ from history_, looping over items in history and putting in dict, {model_id: hist})

API: add cv_results_

66da937

This mirrors scikit-learn's cv_results_, with a one important distinction: this implementation only test on 1 training set. This means that there's a `test_score` key, not `mean_test_score`, or `test_score0`.

BUG: ∞ loop in IncrementalSearchCV if decay_rate=0

8c7153d

TST: perform basic search (decay_rate=0) in test_search_basic

b5ce47c

MAINT: collapse _adapt and _stop_on_plateau into one function

c73f79c

stsievert mentioned this pull request Oct 15, 2018

API: Incremental search improvements #370

Closed

Fix test formatting

ead5f31

stsievert mentioned this pull request Oct 15, 2018

API: improve and clean IncrementalSearch API #404

Merged

TomAugspurger added 2 commits October 15, 2018 14:32

Use scipy to rank

7b43e16

Replace cv_results asserts with sanity checks

74d7739

stsievert force-pushed the search-bug branch from 47fe759 to c73f79c Compare October 16, 2018 00:57

stsievert added 2 commits October 15, 2018 19:59

Merge branch 'search-api' into search-bug

17294ad

TST: if passive, return highest scoring model else sanity checks

c13ea62

stsievert mentioned this pull request Oct 16, 2018

IncrementalSearchCV runs an adaptive algorithm by default #388

Closed

Merge remote-tracking branch 'upstream/master' into stsievert-search-bug

c46a5c6

lint

2cbf9bb

TomAugspurger added 2 commits October 17, 2018 08:27

Merge remote-tracking branch 'upstream/master' into stsievert-search-bug

c0c4c44

run tests

c08527f

TomAugspurger merged commit 0860ae6 into dask:master Oct 19, 2018

TomAugspurger mentioned this pull request Mar 11, 2019

Fix the link in dask documentation #479

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT, BUG, TST: incremental API cleaning #406

MAINT, BUG, TST: incremental API cleaning #406

stsievert commented Oct 15, 2018 •

edited

Loading

TomAugspurger commented Oct 16, 2018

TomAugspurger commented Oct 16, 2018

stsievert commented Oct 16, 2018

TomAugspurger commented Oct 16, 2018

stsievert commented Oct 16, 2018

TomAugspurger commented Oct 16, 2018 via email

TomAugspurger commented Oct 17, 2018

TomAugspurger commented Oct 17, 2018

TomAugspurger commented Oct 19, 2018

MAINT, BUG, TST: incremental API cleaning #406

MAINT, BUG, TST: incremental API cleaning #406

Conversation

stsievert commented Oct 15, 2018 • edited Loading

TomAugspurger commented Oct 16, 2018

TomAugspurger commented Oct 16, 2018

stsievert commented Oct 16, 2018

TomAugspurger commented Oct 16, 2018

stsievert commented Oct 16, 2018

TomAugspurger commented Oct 16, 2018 via email

TomAugspurger commented Oct 17, 2018

TomAugspurger commented Oct 17, 2018

TomAugspurger commented Oct 19, 2018

stsievert commented Oct 15, 2018 •

edited

Loading