API: Incremental search improvements #370

stsievert · 2018-09-19T02:07:44Z

This adds "CV" to BaseIncrementalSearch and it's children. BaseIncrementalSearchCV does do some (very basic) cross validation.

This PR

~~API: rename IncrementalSearch to TimeDecaySearchCV~~
API: renames *Search to *SearchCV
API: renames history_results_ to history_
API: adds cv_results_ to BaseAdaptiveSearchCV.
API: in _fit, add all model history to info
API: make IncrementalSearchCV passive by default (or default to decay_rate=0)
MAINT: selects best model by highest score at last partial fit call if adaptive algorithm selects that model
~~MAINT: adds private AdaptiveStopOnPlateauSearchCV to all easier implementation of adaptive algorithm that want to stop on plateau (e.g., ENH: Hyperband implementation #221).~~
DOC: clarify definition "best" (which also discussed to be highest final score) in docstring
DOC: Add "Attributes" section to AdaptiveSearchCV in docstring
~~DOC: clarify use of "adaptive" and "incremental" in docs~~

~~I also tried to come up with a better name for IncrementalSearchCV: it seems like a very general name (but I didn't think about it too hard).~~

TomAugspurger · 2018-09-19T02:19:05Z

My main motivation for not using the CV suffix was that it doesn't support the cv parameter found in RandomizedSearchCV and GridSearchCV.

stsievert · 2018-09-19T02:32:34Z

My main motivation for not using the CV suffix was that it doesn't support the cv parameter found in RandomizedSearchCV and GridSearchCV.

Hm... My motivation for changing to *SearchCV was because it does a search with a validation set. It does not score models from the training set.

I've made some other changes too, summarized in #370 (comment).

mrocklin · 2018-09-19T02:35:37Z

I think that this is probably the sort of thing we should discuss before anyone invests significant work in the renaming. My guess is that we might cycle through a dozen names before finding the right one.

I agree that incremental isn't great, but I also think that decay is probably also not ideal.

TomAugspurger · 2018-09-19T02:48:20Z

On the base of the name, Adaptive is certainly descriptive, but is it too general?

About CV, I'm I don't have a strong opinion either way. The thing that pushed me to remove it was GridSearchCV(..., cv=1) raising an error.

In [26]: KFold(1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-f1793b6f886b> in <module>()
----> 1 KFold(1)

~/Envs/pandas-dev/lib/python3.6/site-packages/scikit-learn/sklearn/model_selection/_split.py in __init__(self, n_splits, shuffle, random_state)
    417     def __init__(self, n_splits=3, shuffle=False,
    418                  random_state=None):
--> 419         super(KFold, self).__init__(n_splits, shuffle, random_state)
    420
    421     def _iter_test_indices(self, X, y=None, groups=None):

~/Envs/pandas-dev/lib/python3.6/site-packages/scikit-learn/sklearn/model_selection/_split.py in __init__(self, n_splits, shuffle, random_state)
    282                 "k-fold cross-validation requires at least one"
    283                 " train/test split by setting n_splits=2 or more,"
--> 284                 " got n_splits={0}.".format(n_splits))
    285
    286         if not isinstance(shuffle, bool):

ValueError: k-fold cross-validation requires at least one train/test split by setting n_splits=2 or more, got n_splits=1.

stsievert · 2018-09-20T04:26:06Z

I've updated this (and the notes at #370 (comment)):

renamed history_results_ to history_
selected the best model by highest score at last partial fit call
added cv_results_ to BaseAdaptiveSearchCV.

I've added cv_results_ because this is what I expect as a user: various stats after each model finished training. @TomAugspurger can you expand on history_results_?

This required some changes to always get the most recent score (with the highest partial fit call) from history_. I think these changes could be cleaned up.

This PR needs certainly needs feedback and another review before merge. I've added the WIP label for another review by me.

TomAugspurger · 2018-09-20T20:13:23Z

Sorry, missed this notification earlier.

history_ -> history_results_ seems good. The idea was to mirror cv_results_ but I suppose scikit-learn needed to differentiate from the cv parameter. We don't have that concern with history.

selected the best model by highest score at last partial fit call

I think that was already being done.

This required some changes to always get the most recent score (with the highest partial fit call) from history_.

idle question: do we know that partial_fit_score is monotonically increasing for a given model through history? I don't think we can assume that later = higher (not saying that you do. Haven't looked at the code yet).

TomAugspurger · 2018-09-20T20:16:35Z

My reasons for not implementing cv_results originally were

simplicity
us not supporting the cv parameter.

So going back to me not wanting CV suffix because we don't support the cv parameter. Do you have thoughts on @stsievert? Is Search expressive enough on its own that we're doing hyperparmeter optimization

TomAugspurger · 2018-09-20T20:18:24Z

dask_ml/model_selection/_incremental.py

-            best_index += len(v)
-            if k == best_model_id:
-                break
+        for h in hist2:


Should this just be done by _fit?

It should be but isn't (only surviving model_ids are in info). I'll make that change.

I've made the change which is makes info have the same data as history, but in a different format. This breaks a test:

dask-ml/tests/model_selection/test_incremental.py

Line 181 in b5b6954

assert len(models) == len(info) == 1

The relevant bits of my diff are

- assert len(info) == len(models) == 1 + assert len(info) > len(models) == 1 + assert set(models.keys()).issubset(set(info.keys())) + + calls = {k: [h["partial_fit_calls"] for h in hist] for k, hist in info.items()} + for k, call in calls.items(): + assert (np.diff(call) >= 1).all()

This change provides some ease-of-use and another test. Do we want to make this change?

Can you post what info looks like now? Breaking tests is fine for now, if we think it's a better data format.

TomAugspurger · 2018-09-20T20:19:23Z

dask_ml/model_selection/_incremental.py

+        """ list of dicts to dict of lists. Assumes same keys in all dicts.
+        https://stackoverflow.com/questions/5558418/list-of-dicts-to-from-dict-of-lists
+        """
+        return {k: [dic[k] for dic in LD] for k in LD[0]}


style nit: easier to inline below, especially if we're using it only once.

dask_ml/model_selection/_incremental.py

stsievert · 2018-09-20T23:31:33Z

So going back to me not wanting CV suffix because we don't support the cv parameter. Do you have thoughts on stsievert? Is Search expressive enough on its own that we're doing hyperparmeter optimization

I do not think Search is expressive enough; to me *Search indicates "train and test on the same data" and *SearchCV indicates "train and test on different data". This class does perform (basic) cross validation because it creates separate datasets to test and train on. This also underpins the naming of cv_results_.

The cv parameter specifies the amount of cross validation for a particular cross validation scheme, KFold (and does not specify whether or whether not to use cross validation). This class could also the vary the amount of cross validation with the something like

search = AdaptiveSearchCV()
datasets = [train_test_split(X, y) for _ in range(cv)]
scores = [search.fit(X1, y1).score(X2, y2)
          for X1, X2, y1, y2 in datasets]

though I don't think this should be implemented right now (one train/test set is used in practice).

idle question: do we know that partial_fit_score is monotonically increasing for a given model through history? I don't think we can assume that later = higher (not saying that you do. Haven't looked at the code yet).

Correct – monotonically increasing scores is a very bad assumption (though that's almost the case when patience=1 and tol=0).

I think we should choose best_estimator_ based off the maximum final score, not the maximum score after any partial_fit_call. Two cases where reporting the maximum score on any partial_fit breaks down:

maybe an estimator receives a high score on a particular partial_fit call because the same data is fed to partial_fit and score.
maybe a particular estimator overfits the data. At one point, this model could have had the best score for any model. However, it could converges to the lowest possible (finite) score.

The protection against score decreasing too much is to stop training on plateau, which is specified by the patience and tol parameters (though allowing some decrease of score may be required to reach a better solution).

Some of this will be fixed by #370 (comment), but not all of it.

selected the best model by highest score at last partial fit call

I think that was already being done.

best_model_id was selected to be the model returned by _fit without looking at any scores https://github.com/dask/dask-ml/pull/370/files#diff-0a54ded74c8013caf0588e9a5c879e99L487.

TomAugspurger · 2018-09-26T11:46:11Z

though I don't think this should be implemented right now (one train/test set is used in practice).

Agreed, though it's good to know that this could be done in the future.

best_model_id was selected to be the model returned by _fit without looking at any scores https://github.com/dask/dask-ml/pull/370/files#diff-0a54ded74c8013caf0588e9a5c879e99L487.

IIUC, at that point info had just a single element, though I may be mistaken. But this may be moot given your changes to info.

dask_ml/model_selection/_incremental.py

stsievert · 2018-09-28T19:49:42Z

@TomAugspurger and I talked about this PR, and generated a couple more TODOs:

API: DecaySearchCV => TimeDecaySearchCV (related: time_inverse_decay)
API: set param__* keys in cv_results_
API: keep the name AdaptiveSearchCV
DOC: clarify definition "best" (which also discussed to be highest final score)
DOC: document how to choose patience and tol (probably by example)
DOC: Add "Attributes" section to AdaptiveSearchCV

mrocklin · 2018-09-28T20:22:15Z

TimeDecaySearchCV

What was the motivation behind this name? Were there any alternatives proposed?

TomAugspurger · 2018-09-28T20:28:24Z

Many :) Some considered were IncrementalSearch (the same), DecaySearch, ExponentialSearch, ExponentialDecaySearch, TimeInverseDecaySearch, probably others.

We want to convey the most important aspect of this strategy, the decaying number of models trained as we moved through time. TimeDecaySearch seemed to strike a reasonable balance between accuracy and length.

As for CV, the train-test split we do internally is a form of cross validation, so it's best to include the suffix.

mrocklin · 2018-09-28T20:29:56Z

Note that this is also currently what backs RandomSearch, which doesn't significantly decrease the number of models over time (except when they seem to plateau)

dask_ml/model_selection/_incremental.py

TomAugspurger · 2018-09-29T11:35:11Z

dask_ml/model_selection/_incremental.py

        The required level of improvement to consider stopping training on
        that model. The most recent score must be at at most ``tol`` better
        than the all of the previous ``patience`` scores for that model.
        Increasing ``tol`` will tend to reduce training time, at the cost
        of worse models.

+        Setting this value to be negative allows for some noise tolerance in


I think this doesn't sit with the previous paragraph (maybe the previous paragraph is wrong).

It sounds like tol is influencing the question "should we stop?". Something like if current_score - previous_score > tol: stop. Setting this to negative would mean we'll consider stopping on a model that performed worse than the previous one, while with positive the final model must always be better than the previous patience (until we hit max_iter).

I also can't tell what we mean by "more accurate estimator", or why that would be. I can imagine it being because we're not overfitting as much, but I can also imagine other explanations.

Looking at the code, we say if all(current_score < old + tol for old in patience). So a negative tol makes that condition more likely to be true, i.e. we continue training even when our score has dipped. So I think the first paragraph explaining tol is incorrect?

I set it negative to stop training when the scores starts to decrease. I'd expect to stop when performance starts to degrade, not when performance doesn't increase enough.

I guess my expectations are wrong; PyTorch's ReduceLROnPlateau performs it's action when all(old > best + threshold for old in plateau) for some positive threshold. That is, it's action is performed when models haven't improved by threshold in patience epochs.

TODO:

default tol=1e-4 (PyTorch's ReduceLROnPlateau threshold).

TomAugspurger · 2018-09-29T11:45:53Z

docs/source/hyper-parameter-search.rst

-estimators and hyperparameter combinationms) and repeatedly calls the underlying estimator's
-``partial_fit`` method with batches of data.
+
+These model selection algorithms are *adaptive*: they decide to keep training


I think we should split these two ideas a bit more cleanly in the docs:

Adaptive: Using previous fit information to decide which models to train next

Incremental: Training on large datasets by incrementally passing batches to partial_fit.

As written, this paragraph implies that both of these are necessary, when you could imagine Incremental without Adaptive, and Adaptive without Incremental.

I'm going back a bit on what we discussed yesterday. Our implementation requires Incremental (it only works with underlying estimators that implement partial_fit). So to me that's the most important of the two, and if we're only going to highlight one (e.g. the section title) it should be incremental.

I agree with you; I'm also rethinking naming. I agree that Incremental is more important to highlight than Adaptive; all adaptive algorithms must be incremental, but not all incremental algorithms are adaptive.

TODO:

rename BaseAdaptiveSearchCV => BaseIncrementalSearchCV

I've done some more thinking on naming. TimeDecaySearchCV does two things:

stop training on plateau (one very basic adaptive scheme)

a fancier adaptive scheme (i.e., inverse/topk).

I also think we should rename TimeDecay*; that really only conveys that fewer models are trained as time progresses, which is also conveyed with Adaptive.

TODO:

separate "stop on plateau" and adaptive methods in TimeDecaySearchCV

rename TimeDecaySearchCV => InverseTimeDecaySearchCV

docs/source/hyper-parameter-search.rst

TomAugspurger · 2018-09-29T11:47:25Z

dask_ml/model_selection/_incremental.py

+        * ``model_id``
+        * ``partial_fit_calls``
+        * ``params``
+        * ``param_{key}`` for every ``key`` in the parameters for that model.


Rather than "parameters for that model", should it be "hyperparameters in params"?

Or "where key is every key in params"?

Scikit-learn gives an example cv_results_ in their docs, and don't address this generally.

TomAugspurger · 2018-09-29T11:48:05Z

dask_ml/model_selection/_incremental.py

+
+    best_score_ : float
+        Best score achieved on the hold out data, where "best" means "highest
+        score after a models final partial fit call".


missing apostrophe in "model's".

Is the explanation "highest final score on the validation set" clearer?

I avoided that because I wanted to define "final". I've changed it to

where "best" means "the highest score on the hold out set after a model's last partial fit call."

stsievert · 2018-09-30T03:04:55Z

dask_ml/model_selection/_incremental.py

-        bad = set(info) - set(best)
+        self._all_models.update(set(info.keys()))
+        instructions = self._adapt(info)
+        bad = set(self._all_models) - set(instructions)


I'm getting a bug with this sometimes. This commit attempts to resolve it. On occasion, I get a KeyError in _incremental.py with speculative.pop(ident) at

dask-ml/dask_ml/model_selection/_incremental.py

Line 230 in 37eca7d

model = speculative.pop(ident)

which I take to mean some models are being killed early. This happens bad == set() at

dask-ml/dask_ml/model_selection/_incremental.py

Line 212 in 37eca7d

bad = set(models) - set(instructions)

@mrocklin any thoughts here? I've seen this occasionally to, but haven't been able to debug it successfully.

No thoughts off-hand. Is anyone able to produce a minimal example by any chance?

dask_ml/model_selection/_incremental.py

TomAugspurger · 2018-10-01T11:06:00Z

dask_ml/model_selection/_incremental.py

@@ -256,6 +256,11 @@ def get_futures(partial_fit_calls):

    models = {k: client.submit(operator.getitem, v, 0) for k, v in models.items()}
    yield wait(models)
+    info = {}


Slight preference for

info = defaultdict(list) for h in history: info[h["model_id"]].append(h) info = dict(info)

TomAugspurger · 2018-10-01T11:07:34Z

dask_ml/model_selection/_incremental.py

@@ -599,20 +614,94 @@ def score(self, X, y=None):
        return self.scorer_(self.best_estimator_, X, y)


-class IncrementalSearch(BaseIncrementalSearch):
+class AdaptiveStopOnPlateauSearchCV(BaseIncrementalSearchCV):


Inconsistent with the name in the __init__.py.

-1 on this name. I think that it is far too long and technical. I think that it would scare non-techncial users.

AdaptiveStopOnPlateauSearchCV is private (or at least that's what I intend). It is intended for use by other developers who have an adaptive algorithm they'd like to implement and also have stop on plateau (e.g., #221).

Here's the structure I intended:

BaseIncrementalSearchCV: private.

Use: base class for all incremental searches

AdaptiveStopOnPlateauSearchCV: private, inherits from BaseIncrementalSearchCV

Use: allow other devs easily create adaptive algorithms that stop on plateau (by overriding _adapt)

InverseTimeDecaySearchCV: public, inherits from AdaptiveStopOnPlateauSearchCV

I think that something like the implementation of IncrementalSearch in master should be public. I think that InverseTimeDecaySearchCV is also probably an overly-technical name.

I recommend that we open an issue where we can discuss name changes to what is in master today and future restructurings. In general I recommend doing this before investing much work.

Indeed. The changes to info and cv_results_, etc. are useful in their own right. Those shouldn't be held up by a naming discussion.

TomAugspurger · 2018-10-01T11:17:45Z

docs/source/hyper-parameter-search.rst

@@ -15,7 +15,7 @@ The underlying estimator will need to be able to train on each cross-validation
 See :ref:`hyperparameter.drop-in` for more.

 If your data is large and the underlying estimator implements ``partial_fit``, you can
-Dask-ML's :ref:`*incremental* hyperparameter optimizers <hyperparameter.incremental>`.


I'm starting to doubt the "two kinds of hyperparameter optimization estimators" I laid out on line 6.

After our conversation the other day, it sounds like there are (at least) 3.

Static search: All combinations to be tried are specified up front. The full dataset (or a CV split of the full dataset) is used on each training call.

Adaptive search: Regardless of exactly which data is used (CV split of full dataset, or a batch), the salient feature of adaptive search is that models are prioritized according to past performance.

Incremental search: Regardless of how models are chosen for training (static or adaptive), the salient feature of Incremental search is that models are trained on batches of data passed to _parital_fit. The underlying estimator must implement partial_fit.

Our current implementation is a mix of 2 and 3. Does that framing accord with other's thoughts? Can we think of a clear way to explain that? The docs below do a decent job I think. This line, though, I think emphasizes adaptive too much given the framing above (you have a large dataset).

Maybe a clearer way of explaining it:

There are two dimensions to searches:

Does the search use partial_fit or fit?

Is the search passive or adaptive? That is, does it use previous evaluations to select which models to train further?

Your categories fall pretty nicely into this (maybe renaming "static" to "passive").

TomAugspurger · 2018-10-13T16:14:21Z

FWIW, that would require force pushing commits, which makes reviewing somewhat harder since it breaks the “new changes” UI.

…

________________________________ From: Scott Sievert <[email protected]> Sent: Saturday, October 13, 2018 10:27:34 AM To: dask/dask-ml Cc: Tom Augspurger; Mention Subject: Re: [dask/dask-ml] API: Incremental search improvements (#370) how painful would it be for you to separate this PR into a few PRs, each of which addresses only a single issue? I think it'd be easier for me to have separate commits, one for each issue (e.g., "API: add cv_results_ to IncrementalSearchCV"). I think this would be easy to review: the diffs would be small, and would have a good message. I think that'd also work better with GitHub's UI (which defaults to showing diff from master?). Creating separate commits will be required for both choices, so let's see how it works then move to separate PRs if need be. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#370 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHIojbGn84My_X7hvr7BxtWiYpSJ8Xks5ukgZmgaJpZM4WvR3g>.

stsievert · 2018-10-13T17:34:38Z

that would require force pushing commits, which makes reviewing somewhat harder since it breaks the “new changes” UI.

Whoops, already forced pushed. Is this a problem on your end? I'm not sure what "new changes UI" refers to. I could open another PR with these changes, and revert this PR back to it's previous state; would that help?

mrocklin · 2018-10-13T17:46:05Z

I'm ok reviewing things commit by commit, but I'd prefer that they be atomic, so for example you would want to merge 0ac521f and 5299f1b into a single commit (there are probably other cases of this)

mrocklin · 2018-10-13T17:47:49Z

dask_ml/model_selection/_incremental.py

+                not self.patience
+                or next_time_step - current_time_step < self.scores_per_fit
+            ):
+                next_time_step += 1


Thanks for catching this. It looks like this could probably use a test.

mrocklin · 2018-10-13T17:48:34Z

tests/model_selection/test_incremental.py

-    search = IncrementalSearchCV(model, params, n_initial_parameters=20, max_iter=10)
+    search = IncrementalSearchCV(
+        model, params, n_initial_parameters=20, max_iter=10, decay_rate=0
+    )


Why this change? Why not also engage decay rate here?

Because for this dataset and model, I had to make a change to the tests if I included decay_rate=1:

- assert ( - search.cv_results_["test_score"][search.best_index_] - >= search.cv_results_["test_score"] - ).all() - search.cv_results_["rank_test_score"][search.best_index_] == 1 + assert ( + search.cv_results_["test_score"][search.best_index_] + >= np.percentile(search.cv_results_["test_score"], 90) + ) + search.cv_results_["rank_test_score"][search.best_index_] <= 3

(same change as in #370 (comment))

I interpret this change as meaning "random searches work best", which makes sense because the model/dataset are simple and not performant. Plus, this test is aimed at testing basic elements of IncrementalSearchCV but also relies on the adaptive algorithm working well.

That is, without decay_rate=0, this search wouldn't return the highest scoring model (with this small dataset and simple model, the search is too adaptive).

mrocklin · 2018-10-13T17:58:42Z

dask_ml/model_selection/_incremental.py

-            if k == best_model_id:
-                break
-
-        return results.models[best_model_id], best_index


I'm a bit confused about the need for _get_best generally. Why do we need to search for the best result throughout history. Is this information that _fit could just give us directly? I suspect that we're spending time copying how sklearn does things when it might not be the best approach overall. Thoughts?

Why do we need to search for the best result throughout history

We don't. The current implementation more or less does

score = {k: v[-1]["score"] for k, v in results.info.items()} best_model_id = max(score, key=score.get)

The previous implementation picked the best model ID before searching through history.

Is this information that _fit could just give us directly?

I refactored _fit to return information that's very useful in calculating that information, but not that information (the "best" model ID?) directly.

In this PR, _fit gives us model history separated by model ID for every model (not only the final models _additional_calls selects) in results.info. This is really useful: it allows getting the final scores for every model with

info, models, history = fit(...) final_scores = {k: v[-1]["score"] for k, v in info.items()} best_model_id = max(final_scores, key=final_scores.get)

and change it in the docs too

* Rename history_results_ => history_ * Provide complete model history, and make it public (otherwise boilerplate needed to formulate model_history_ from history_, looping over items in history and putting in dict, {model_id: hist})

This mirrors scikit-learn's cv_results_, with a one important distinction: this implementation only test on 1 training set. This means that there's a `test_score` key, not `mean_test_score`, or `test_score0`.

Before, BaseIncrementalSearchCV assumed _additional_calls returned one model and returned that to the user. Now, BaseIncrementalSearchCV chooses the model with the highest score returned by _additional_calls. This matters if desired to do a random search, or if `max_iter` is hit.

* MAINT: cleaner separation with _adapt and _stop_on_plateau functions (separates complex adaptive algorithm and stopping on plateau, and allows for overwriting _adapt for other adaptive algorithms that want to stop on plateau) * TST: implement tests for patience and tolerance parameters * MAINT: define "patience" to be the number of partial_fit calls, not the number of score calls

DOC: Change warning note to "this class", not "IncrementalSearch"

mrocklin · 2018-10-13T18:08:31Z

dask_ml/model_selection/_incremental.py

+        instructions = self._adapt(info)
+
+        out = self._stop_on_plateau(info, instructions)
+        return out


I think that I would prefer to keep things as a single method until a concrete need arises to separate them.

While I appreciate the desire to split things apart into smaller chunks I find this adds a cost of indirection, which makes code a bit harder to review and maintain in the future. I find that code that is finely split often feels better when first writing it, but ages poorly when you have to quickly understand what is going on after a year of not looking at it.
There is obviously a length limit to this (functions of several hundred lines are unpleasant), but I don't think that the _additional_calls function is yet there.

I haven't yet reviewed the other parts of this commit that change the behavior of _additional_calls, partly because it's hard to separate those changes from the refactor. If possible I'd prefer that we remove the refactor so that it's easier to review and discuss the algorithmic changes.

I think that I would prefer to keep things as a single method until a concrete need arises to separate them.

I have a need for _adapt in #221. I'll make the change there, and move it back to one function in this commit.

I appreciate it. I think that including this change over there will probably help to motivate it.

mrocklin · 2018-10-13T18:11:14Z

dask_ml/model_selection/_incremental.py

@@ -817,6 +818,24 @@ class IncrementalSearchCV(BaseIncrementalSearchCV):
    >>> search = IncrementalSearchCV(model, params, random_state=0,
    ...                              n_initial_parameters=1000,
    ...                              patience=20, max_iter=100)
+
+    ``tol`` and ``patience`` are for stopping training on a "plateau", or when


Is the term plateau likely to be well understood by novice users? If yes then we should remove the quotes. If no then we might want to consider avoiding the term and instead just describe it as you do after using it. I might even start off this paragraph with

"Often when training we get to a situation where additional work leads to little to no gain. In these cases the tol and patience parameters can help us stop work early. ..."

mrocklin · 2018-10-13T18:12:19Z

dask_ml/model_selection/_incremental.py

+
+    ``tol`` and ``patience`` are for stopping training on a "plateau", or when
+    the hold-out score flattens and/or falls. For example, setting ``tol=0``
+    and ``patience=2`` will dictate that scores will always increase until


We're not dictating that scores will increase (we don't determine that). We might need to change the language here.

mrocklin · 2018-10-14T01:49:20Z

This is where commit-by-commit reviewing becomes challenging. I'm going to ask that you rebase to keep reviewable chunks atomic. (separate PRs would also be welcome of course).

stsievert · 2018-10-14T23:34:51Z

This is where commit-by-commit reviewing becomes challenging. I'm going to ask that you rebase to keep reviewable chunks atomic. (separate PRs would also be welcome of course).

I see. I'll try to split into separate 3 PRs:

API improvements (cv_results_, renaming, model_history_, etc).
Documentation changes
Bugs fixes/cleaning stop on plateau

How does that sound? Should the PRs be finer grained?

stsievert · 2018-10-15T04:27:19Z

Superseded by #404, #405, #406. Dependency tree:

API: improve and clean IncrementalSearch API #404: implements API changes
DOC: IncrementalSearchCV #405: documentation changes
MAINT, BUG, TST: incremental API cleaning #406: some fixes and tests around stop on plateau

stsievert changed the title ~~MAINT: *Search => *SearchCV~~ API: *Search => *SearchCV Sep 19, 2018

stsievert changed the title ~~API: *Search => *SearchCV~~ API: Incremental changes Sep 19, 2018

stsievert changed the title ~~API: Incremental changes~~ API: Incremental search naming changes Sep 19, 2018

stsievert force-pushed the rename-patience branch 5 times, most recently from 86c2182 to 8531fc2 Compare September 20, 2018 04:11

stsievert changed the title ~~API: Incremental search naming changes~~ WIP: API: Incremental search naming changes Sep 20, 2018

TomAugspurger reviewed Sep 20, 2018

View reviewed changes

TomAugspurger reviewed Sep 28, 2018

View reviewed changes

dask_ml/model_selection/_incremental.py Show resolved Hide resolved

TomAugspurger reviewed Sep 28, 2018

View reviewed changes

dask_ml/model_selection/_incremental.py Outdated Show resolved Hide resolved

stsievert force-pushed the rename-patience branch from 2007c95 to 8eb1582 Compare September 29, 2018 03:45

stsievert changed the title ~~WIP: API: Incremental search naming changes~~ API: Incremental search naming changes Sep 29, 2018

TomAugspurger reviewed Sep 29, 2018

View reviewed changes

stsievert commented Sep 30, 2018

View reviewed changes

TomAugspurger reviewed Oct 1, 2018

View reviewed changes

stsievert force-pushed the rename-patience branch from bf4df4f to 1cba727 Compare October 13, 2018 17:21

mrocklin reviewed Oct 13, 2018

View reviewed changes

stsievert added 10 commits October 13, 2018 13:02

API: rename IncrementalSearch => IncrementalSearchCV

8005b75

and change it in the docs too

API: rename history_results_, and format differently

0335087

* Rename history_results_ => history_ * Provide complete model history, and make it public (otherwise boilerplate needed to formulate model_history_ from history_, looping over items in history and putting in dict, {model_id: hist})

API: add cv_results_

66da937

This mirrors scikit-learn's cv_results_, with a one important distinction: this implementation only test on 1 training set. This means that there's a `test_score` key, not `mean_test_score`, or `test_score0`.

DOC: add "Attributes" section to IncrementalSearchCV

7d81d45

BUG: ∞ loop in IncrementalSearchCV if decay_rate=0

cfc9cba

DOC: exponential => inverse decay, stop_on_plateau example in docstring,

b57c9f1

DOC: Change warning note to "this class", not "IncrementalSearch"

API: max_iter=100 (resolves confict between docs and impl)

c4cfcdf

TST: perform basic search (decay_rate=0) in test_search_basic

be72caa

mrocklin reviewed Oct 13, 2018

View reviewed changes

MAINT: collapse _adapt and _stop_on_plateau into one function

8d7478f

stsievert force-pushed the rename-patience branch from eb9c608 to 8d7478f Compare October 13, 2018 18:25

DOC: reword patience/tol example

cfe3fab

stsievert closed this Oct 15, 2018

stsievert mentioned this pull request Oct 15, 2018

API: improve and clean IncrementalSearch API #404

Merged

stsievert mentioned this pull request Oct 30, 2018

KFold or RepeatedKFold with Incremental estimator #421

Open

stsievert mentioned this pull request Apr 8, 2019

ENH: Hyperband implementation #221

Merged

7 tasks

API: Incremental search improvements #370

API: Incremental search improvements #370

Conversation

stsievert commented Sep 19, 2018 • edited Loading

TomAugspurger commented Sep 19, 2018

stsievert commented Sep 19, 2018

mrocklin commented Sep 19, 2018

TomAugspurger commented Sep 19, 2018

stsievert commented Sep 20, 2018

TomAugspurger commented Sep 20, 2018

TomAugspurger commented Sep 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stsievert commented Sep 20, 2018

TomAugspurger commented Sep 26, 2018

stsievert commented Sep 28, 2018 • edited Loading

mrocklin commented Sep 28, 2018

TomAugspurger commented Sep 28, 2018

mrocklin commented Sep 28, 2018

Choose a reason for hiding this comment

stsievert Sep 29, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stsievert Sep 29, 2018 • edited Loading

Choose a reason for hiding this comment

stsievert Sep 29, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Oct 13, 2018 via email

stsievert commented Oct 13, 2018

mrocklin commented Oct 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stsievert Oct 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrocklin commented Oct 14, 2018

stsievert commented Oct 14, 2018

stsievert commented Oct 15, 2018

stsievert commented Sep 19, 2018 •

edited

Loading

stsievert commented Sep 28, 2018 •

edited

Loading

stsievert Sep 29, 2018 •

edited

Loading

stsievert Sep 29, 2018 •

edited

Loading

stsievert Sep 29, 2018 •

edited

Loading

stsievert Oct 13, 2018 •

edited

Loading