Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Incremental search improvements #370

Closed
wants to merge 12 commits into from

Conversation

stsievert
Copy link
Member

@stsievert stsievert commented Sep 19, 2018

This adds "CV" to BaseIncrementalSearch and it's children. BaseIncrementalSearchCV does do some (very basic) cross validation.

This PR

  • API: rename IncrementalSearch to TimeDecaySearchCV
  • API: renames *Search to *SearchCV
  • API: renames history_results_ to history_
  • API: adds cv_results_ to BaseAdaptiveSearchCV.
  • API: in _fit, add all model history to info
  • API: make IncrementalSearchCV passive by default (or default to decay_rate=0)
  • MAINT: selects best model by highest score at last partial fit call if adaptive algorithm selects that model
  • MAINT: adds private AdaptiveStopOnPlateauSearchCV to all easier implementation of adaptive algorithm that want to stop on plateau (e.g., ENH: Hyperband implementation #221).
  • DOC: clarify definition "best" (which also discussed to be highest final score) in docstring
  • DOC: Add "Attributes" section to AdaptiveSearchCV in docstring
  • DOC: clarify use of "adaptive" and "incremental" in docs

I also tried to come up with a better name for IncrementalSearchCV: it seems like a very general name (but I didn't think about it too hard).

@stsievert stsievert changed the title MAINT: *Search => *SearchCV API: *Search => *SearchCV Sep 19, 2018
@TomAugspurger
Copy link
Member

My main motivation for not using the CV suffix was that it doesn't support the cv parameter found in RandomizedSearchCV and GridSearchCV.

@stsievert stsievert changed the title API: *Search => *SearchCV API: Incremental changes Sep 19, 2018
@stsievert stsievert changed the title API: Incremental changes API: Incremental search naming changes Sep 19, 2018
@stsievert
Copy link
Member Author

My main motivation for not using the CV suffix was that it doesn't support the cv parameter found in RandomizedSearchCV and GridSearchCV.

Hm... My motivation for changing to *SearchCV was because it does a search with a validation set. It does not score models from the training set.

I've made some other changes too, summarized in #370 (comment).

@mrocklin
Copy link
Member

I think that this is probably the sort of thing we should discuss before anyone invests significant work in the renaming. My guess is that we might cycle through a dozen names before finding the right one.

I agree that incremental isn't great, but I also think that decay is probably also not ideal.

@TomAugspurger
Copy link
Member

On the base of the name, Adaptive is certainly descriptive, but is it too general?

About CV, I'm I don't have a strong opinion either way. The thing that pushed me to remove it was GridSearchCV(..., cv=1) raising an error.

In [26]: KFold(1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-f1793b6f886b> in <module>()
----> 1 KFold(1)

~/Envs/pandas-dev/lib/python3.6/site-packages/scikit-learn/sklearn/model_selection/_split.py in __init__(self, n_splits, shuffle, random_state)
    417     def __init__(self, n_splits=3, shuffle=False,
    418                  random_state=None):
--> 419         super(KFold, self).__init__(n_splits, shuffle, random_state)
    420
    421     def _iter_test_indices(self, X, y=None, groups=None):

~/Envs/pandas-dev/lib/python3.6/site-packages/scikit-learn/sklearn/model_selection/_split.py in __init__(self, n_splits, shuffle, random_state)
    282                 "k-fold cross-validation requires at least one"
    283                 " train/test split by setting n_splits=2 or more,"
--> 284                 " got n_splits={0}.".format(n_splits))
    285
    286         if not isinstance(shuffle, bool):

ValueError: k-fold cross-validation requires at least one train/test split by setting n_splits=2 or more, got n_splits=1.

@stsievert stsievert force-pushed the rename-patience branch 5 times, most recently from 86c2182 to 8531fc2 Compare September 20, 2018 04:11
@stsievert stsievert changed the title API: Incremental search naming changes WIP: API: Incremental search naming changes Sep 20, 2018
@stsievert
Copy link
Member Author

I've updated this (and the notes at #370 (comment)):

  • renamed history_results_ to history_
  • selected the best model by highest score at last partial fit call
  • added cv_results_ to BaseAdaptiveSearchCV.

I've added cv_results_ because this is what I expect as a user: various stats after each model finished training. @TomAugspurger can you expand on history_results_?

This required some changes to always get the most recent score (with the highest partial fit call) from history_. I think these changes could be cleaned up.

This PR needs certainly needs feedback and another review before merge. I've added the WIP label for another review by me.

@TomAugspurger
Copy link
Member

Sorry, missed this notification earlier.

history_ -> history_results_ seems good. The idea was to mirror cv_results_ but I suppose scikit-learn needed to differentiate from the cv parameter. We don't have that concern with history.

selected the best model by highest score at last partial fit call

I think that was already being done.

This required some changes to always get the most recent score (with the highest partial fit call) from history_.

idle question: do we know that partial_fit_score is monotonically increasing for a given model through history? I don't think we can assume that later = higher (not saying that you do. Haven't looked at the code yet).

@TomAugspurger
Copy link
Member

My reasons for not implementing cv_results originally were

  1. simplicity
  2. us not supporting the cv parameter.

So going back to me not wanting CV suffix because we don't support the cv parameter. Do you have thoughts on @stsievert? Is Search expressive enough on its own that we're doing hyperparmeter optimization

best_index += len(v)
if k == best_model_id:
break
for h in hist2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just be done by _fit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be but isn't (only surviving model_ids are in info). I'll make that change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the change which is makes info have the same data as history, but in a different format. This breaks a test:

assert len(models) == len(info) == 1

The relevant bits of my diff are

- assert len(info) == len(models) == 1
+ assert len(info) > len(models) == 1
+ assert set(models.keys()).issubset(set(info.keys()))
+
+ calls = {k: [h["partial_fit_calls"] for h in hist] for k, hist in info.items()}
+ for k, call in calls.items():
+     assert (np.diff(call) >= 1).all()

This change provides some ease-of-use and another test. Do we want to make this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you post what info looks like now? Breaking tests is fine for now, if we think it's a better data format.

""" list of dicts to dict of lists. Assumes same keys in all dicts.
https://stackoverflow.com/questions/5558418/list-of-dicts-to-from-dict-of-lists
"""
return {k: [dic[k] for dic in LD] for k in LD[0]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: easier to inline below, especially if we're using it only once.

@stsievert
Copy link
Member Author

So going back to me not wanting CV suffix because we don't support the cv parameter. Do you have thoughts on stsievert? Is Search expressive enough on its own that we're doing hyperparmeter optimization

I do not think Search is expressive enough; to me *Search indicates "train and test on the same data" and *SearchCV indicates "train and test on different data". This class does perform (basic) cross validation because it creates separate datasets to test and train on. This also underpins the naming of cv_results_.

The cv parameter specifies the amount of cross validation for a particular cross validation scheme, KFold (and does not specify whether or whether not to use cross validation). This class could also the vary the amount of cross validation with the something like

search = AdaptiveSearchCV()
datasets = [train_test_split(X, y) for _ in range(cv)]
scores = [search.fit(X1, y1).score(X2, y2)
          for X1, X2, y1, y2 in datasets]

though I don't think this should be implemented right now (one train/test set is used in practice).

idle question: do we know that partial_fit_score is monotonically increasing for a given model through history? I don't think we can assume that later = higher (not saying that you do. Haven't looked at the code yet).

Correct – monotonically increasing scores is a very bad assumption (though that's almost the case when patience=1 and tol=0).

I think we should choose best_estimator_ based off the maximum final score, not the maximum score after any partial_fit_call. Two cases where reporting the maximum score on any partial_fit breaks down:

  • maybe an estimator receives a high score on a particular partial_fit call because the same data is fed to partial_fit and score.
  • maybe a particular estimator overfits the data. At one point, this model could have had the best score for any model. However, it could converges to the lowest possible (finite) score.

The protection against score decreasing too much is to stop training on plateau, which is specified by the patience and tol parameters (though allowing some decrease of score may be required to reach a better solution).

Some of this will be fixed by #370 (comment), but not all of it.

selected the best model by highest score at last partial fit call

I think that was already being done.

best_model_id was selected to be the model returned by _fit without looking at any scores https://github.com/dask/dask-ml/pull/370/files#diff-0a54ded74c8013caf0588e9a5c879e99L487.

@TomAugspurger
Copy link
Member

though I don't think this should be implemented right now (one train/test set is used in practice).

Agreed, though it's good to know that this could be done in the future.

best_model_id was selected to be the model returned by _fit without looking at any scores https://github.com/dask/dask-ml/pull/370/files#diff-0a54ded74c8013caf0588e9a5c879e99L487.

IIUC, at that point info had just a single element, though I may be mistaken. But this may be moot given your changes to info.

@stsievert
Copy link
Member Author

stsievert commented Sep 28, 2018

@TomAugspurger and I talked about this PR, and generated a couple more TODOs:

  • API: DecaySearchCV => TimeDecaySearchCV (related: time_inverse_decay)
  • API: set param__* keys in cv_results_
  • API: keep the name AdaptiveSearchCV
  • DOC: clarify definition "best" (which also discussed to be highest final score)
  • DOC: document how to choose patience and tol (probably by example)
  • DOC: Add "Attributes" section to AdaptiveSearchCV

@mrocklin
Copy link
Member

TimeDecaySearchCV

What was the motivation behind this name? Were there any alternatives proposed?

@TomAugspurger
Copy link
Member

Many :) Some considered were IncrementalSearch (the same), DecaySearch, ExponentialSearch, ExponentialDecaySearch, TimeInverseDecaySearch, probably others.

We want to convey the most important aspect of this strategy, the decaying number of models trained as we moved through time. TimeDecaySearch seemed to strike a reasonable balance between accuracy and length.

As for CV, the train-test split we do internally is a form of cross validation, so it's best to include the suffix.

@mrocklin
Copy link
Member

Note that this is also currently what backs RandomSearch, which doesn't significantly decrease the number of models over time (except when they seem to plateau)

@stsievert stsievert changed the title WIP: API: Incremental search naming changes API: Incremental search naming changes Sep 29, 2018
The required level of improvement to consider stopping training on
that model. The most recent score must be at at most ``tol`` better
than the all of the previous ``patience`` scores for that model.
Increasing ``tol`` will tend to reduce training time, at the cost
of worse models.

Setting this value to be negative allows for some noise tolerance in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doesn't sit with the previous paragraph (maybe the previous paragraph is wrong).

It sounds like tol is influencing the question "should we stop?". Something like if current_score - previous_score > tol: stop. Setting this to negative would mean we'll consider stopping on a model that performed worse than the previous one, while with positive the final model must always be better than the previous patience (until we hit max_iter).

I also can't tell what we mean by "more accurate estimator", or why that would be. I can imagine it being because we're not overfitting as much, but I can also imagine other explanations.

Looking at the code, we say if all(current_score < old + tol for old in patience). So a negative tol makes that condition more likely to be true, i.e. we continue training even when our score has dipped. So I think the first paragraph explaining tol is incorrect?

Copy link
Member Author

@stsievert stsievert Sep 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set it negative to stop training when the scores starts to decrease. I'd expect to stop when performance starts to degrade, not when performance doesn't increase enough.

I guess my expectations are wrong; PyTorch's ReduceLROnPlateau performs it's action when all(old > best + threshold for old in plateau) for some positive threshold. That is, it's action is performed when models haven't improved by threshold in patience epochs.

TODO:

  • default tol=1e-4 (PyTorch's ReduceLROnPlateau threshold).

estimators and hyperparameter combinationms) and repeatedly calls the underlying estimator's
``partial_fit`` method with batches of data.

These model selection algorithms are *adaptive*: they decide to keep training
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should split these two ideas a bit more cleanly in the docs:

  1. Adaptive: Using previous fit information to decide which models to train next
  2. Incremental: Training on large datasets by incrementally passing batches to partial_fit.

As written, this paragraph implies that both of these are necessary, when you could imagine Incremental without Adaptive, and Adaptive without Incremental.

I'm going back a bit on what we discussed yesterday. Our implementation requires Incremental (it only works with underlying estimators that implement partial_fit). So to me that's the most important of the two, and if we're only going to highlight one (e.g. the section title) it should be incremental.

Copy link
Member Author

@stsievert stsievert Sep 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you; I'm also rethinking naming. I agree that Incremental is more important to highlight than Adaptive; all adaptive algorithms must be incremental, but not all incremental algorithms are adaptive.

TODO:

  • rename BaseAdaptiveSearchCV => BaseIncrementalSearchCV

Copy link
Member Author

@stsievert stsievert Sep 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done some more thinking on naming. TimeDecaySearchCV does two things:

  1. stop training on plateau (one very basic adaptive scheme)
  2. a fancier adaptive scheme (i.e., inverse/topk).

I also think we should rename TimeDecay*; that really only conveys that fewer models are trained as time progresses, which is also conveyed with Adaptive.

TODO:

  • separate "stop on plateau" and adaptive methods in TimeDecaySearchCV
  • rename TimeDecaySearchCV => InverseTimeDecaySearchCV

* ``model_id``
* ``partial_fit_calls``
* ``params``
* ``param_{key}`` for every ``key`` in the parameters for that model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than "parameters for that model", should it be "hyperparameters in params"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or "where key is every key in params"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scikit-learn gives an example cv_results_ in their docs, and don't address this generally.


best_score_ : float
Best score achieved on the hold out data, where "best" means "highest
score after a models final partial fit call".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing apostrophe in "model's".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the explanation "highest final score on the validation set" clearer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I avoided that because I wanted to define "final". I've changed it to

where "best" means "the highest score on the hold out set after a model's last partial fit call."

bad = set(info) - set(best)
self._all_models.update(set(info.keys()))
instructions = self._adapt(info)
bad = set(self._all_models) - set(instructions)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting a bug with this sometimes. This commit attempts to resolve it. On occasion, I get a KeyError in _incremental.py with speculative.pop(ident) at

model = speculative.pop(ident)

which I take to mean some models are being killed early. This happens bad == set() at

bad = set(models) - set(instructions)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrocklin any thoughts here? I've seen this occasionally to, but haven't been able to debug it successfully.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No thoughts off-hand. Is anyone able to produce a minimal example by any chance?

@@ -256,6 +256,11 @@ def get_futures(partial_fit_calls):

models = {k: client.submit(operator.getitem, v, 0) for k, v in models.items()}
yield wait(models)
info = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight preference for

info = defaultdict(list)
for h in history:
    info[h["model_id"]].append(h)

info = dict(info)

@@ -599,20 +614,94 @@ def score(self, X, y=None):
return self.scorer_(self.best_estimator_, X, y)


class IncrementalSearch(BaseIncrementalSearch):
class AdaptiveStopOnPlateauSearchCV(BaseIncrementalSearchCV):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent with the name in the __init__.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 on this name. I think that it is far too long and technical. I think that it would scare non-techncial users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AdaptiveStopOnPlateauSearchCV is private (or at least that's what I intend). It is intended for use by other developers who have an adaptive algorithm they'd like to implement and also have stop on plateau (e.g., #221).

Here's the structure I intended:

  • BaseIncrementalSearchCV: private.
    • Use: base class for all incremental searches
  • AdaptiveStopOnPlateauSearchCV: private, inherits from BaseIncrementalSearchCV
    • Use: allow other devs easily create adaptive algorithms that stop on plateau (by overriding _adapt)
  • InverseTimeDecaySearchCV: public, inherits from AdaptiveStopOnPlateauSearchCV

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that something like the implementation of IncrementalSearch in master should be public. I think that InverseTimeDecaySearchCV is also probably an overly-technical name.

I recommend that we open an issue where we can discuss name changes to what is in master today and future restructurings. In general I recommend doing this before investing much work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. The changes to info and cv_results_, etc. are useful in their own right. Those shouldn't be held up by a naming discussion.

@@ -15,7 +15,7 @@ The underlying estimator will need to be able to train on each cross-validation
See :ref:`hyperparameter.drop-in` for more.

If your data is large and the underlying estimator implements ``partial_fit``, you can
Dask-ML's :ref:`*incremental* hyperparameter optimizers <hyperparameter.incremental>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to doubt the "two kinds of hyperparameter optimization estimators" I laid out on line 6.

After our conversation the other day, it sounds like there are (at least) 3.

  1. Static search: All combinations to be tried are specified up front. The full dataset (or a CV split of the full dataset) is used on each training call.
  2. Adaptive search: Regardless of exactly which data is used (CV split of full dataset, or a batch), the salient feature of adaptive search is that models are prioritized according to past performance.
  3. Incremental search: Regardless of how models are chosen for training (static or adaptive), the salient feature of Incremental search is that models are trained on batches of data passed to _parital_fit. The underlying estimator must implement partial_fit.

Our current implementation is a mix of 2 and 3. Does that framing accord with other's thoughts? Can we think of a clear way to explain that? The docs below do a decent job I think. This line, though, I think emphasizes adaptive too much given the framing above (you have a large dataset).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a clearer way of explaining it:

There are two dimensions to searches:

  1. Does the search use partial_fit or fit?
  2. Is the search passive or adaptive? That is, does it use previous evaluations to select which models to train further?

Your categories fall pretty nicely into this (maybe renaming "static" to "passive").

@TomAugspurger
Copy link
Member

TomAugspurger commented Oct 13, 2018 via email

@stsievert
Copy link
Member Author

that would require force pushing commits, which makes reviewing somewhat harder since it breaks the “new changes” UI.

Whoops, already forced pushed. Is this a problem on your end? I'm not sure what "new changes UI" refers to. I could open another PR with these changes, and revert this PR back to it's previous state; would that help?

@mrocklin
Copy link
Member

I'm ok reviewing things commit by commit, but I'd prefer that they be atomic, so for example you would want to merge 0ac521f and 5299f1b into a single commit (there are probably other cases of this)

not self.patience
or next_time_step - current_time_step < self.scores_per_fit
):
next_time_step += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. It looks like this could probably use a test.

search = IncrementalSearchCV(model, params, n_initial_parameters=20, max_iter=10)
search = IncrementalSearchCV(
model, params, n_initial_parameters=20, max_iter=10, decay_rate=0
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? Why not also engage decay rate here?

Copy link
Member Author

@stsievert stsievert Oct 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because for this dataset and model, I had to make a change to the tests if I included decay_rate=1:

- assert (
- search.cv_results_["test_score"][search.best_index_]
-         >= search.cv_results_["test_score"]
- ).all()
- search.cv_results_["rank_test_score"][search.best_index_] == 1
+ assert (
+ search.cv_results_["test_score"][search.best_index_]
+          >= np.percentile(search.cv_results_["test_score"], 90)
+ )
+ search.cv_results_["rank_test_score"][search.best_index_] <= 3

(same change as in #370 (comment))

I interpret this change as meaning "random searches work best", which makes sense because the model/dataset are simple and not performant. Plus, this test is aimed at testing basic elements of IncrementalSearchCV but also relies on the adaptive algorithm working well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, without decay_rate=0, this search wouldn't return the highest scoring model (with this small dataset and simple model, the search is too adaptive).

if k == best_model_id:
break

return results.models[best_model_id], best_index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about the need for _get_best generally. Why do we need to search for the best result throughout history. Is this information that _fit could just give us directly? I suspect that we're spending time copying how sklearn does things when it might not be the best approach overall. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to search for the best result throughout history

We don't. The current implementation more or less does

score = {k: v[-1]["score"] for k, v in results.info.items()}
best_model_id = max(score, key=score.get)

The previous implementation picked the best model ID before searching through history.

Is this information that _fit could just give us directly?

I refactored _fit to return information that's very useful in calculating that information, but not that information (the "best" model ID?) directly.

In this PR, _fit gives us model history separated by model ID for every model (not only the final models _additional_calls selects) in results.info. This is really useful: it allows getting the final scores for every model with

info, models, history = fit(...)
final_scores = {k: v[-1]["score"] for k, v in info.items()}
best_model_id = max(final_scores, key=final_scores.get)

* Rename history_results_ => history_
* Provide complete model history, and make it public
  (otherwise boilerplate needed to formulate model_history_ from history_,
  looping over items in history and putting in dict, {model_id: hist})
This mirrors scikit-learn's cv_results_, with a one important distinction: this implementation only test on 1 training set.

This means that there's a `test_score` key, not `mean_test_score`,
or `test_score0`.
Before, BaseIncrementalSearchCV assumed _additional_calls
returned one model and returned that to the user.

Now, BaseIncrementalSearchCV chooses the model with the highest
score returned by _additional_calls.

This matters if desired to do a random search, or if `max_iter` is hit.
* MAINT: cleaner separation with _adapt and _stop_on_plateau functions
  (separates complex adaptive algorithm and stopping on plateau,
   and allows for overwriting _adapt for other adaptive algorithms
   that want to stop on plateau)
* TST: implement tests for patience and tolerance parameters
* MAINT: define "patience" to be the number of partial_fit calls, not
  the number of score calls
DOC: Change warning note to "this class", not "IncrementalSearch"
instructions = self._adapt(info)

out = self._stop_on_plateau(info, instructions)
return out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I would prefer to keep things as a single method until a concrete need arises to separate them.

While I appreciate the desire to split things apart into smaller chunks I find this adds a cost of indirection, which makes code a bit harder to review and maintain in the future. I find that code that is finely split often feels better when first writing it, but ages poorly when you have to quickly understand what is going on after a year of not looking at it.
There is obviously a length limit to this (functions of several hundred lines are unpleasant), but I don't think that the _additional_calls function is yet there.

I haven't yet reviewed the other parts of this commit that change the behavior of _additional_calls, partly because it's hard to separate those changes from the refactor. If possible I'd prefer that we remove the refactor so that it's easier to review and discuss the algorithmic changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I would prefer to keep things as a single method until a concrete need arises to separate them.

I have a need for _adapt in #221. I'll make the change there, and move it back to one function in this commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate it. I think that including this change over there will probably help to motivate it.

@@ -817,6 +818,24 @@ class IncrementalSearchCV(BaseIncrementalSearchCV):
>>> search = IncrementalSearchCV(model, params, random_state=0,
... n_initial_parameters=1000,
... patience=20, max_iter=100)

``tol`` and ``patience`` are for stopping training on a "plateau", or when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the term plateau likely to be well understood by novice users? If yes then we should remove the quotes. If no then we might want to consider avoiding the term and instead just describe it as you do after using it. I might even start off this paragraph with

"Often when training we get to a situation where additional work leads to little to no gain. In these cases the tol and patience parameters can help us stop work early. ..."


``tol`` and ``patience`` are for stopping training on a "plateau", or when
the hold-out score flattens and/or falls. For example, setting ``tol=0``
and ``patience=2`` will dictate that scores will always increase until
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not dictating that scores will increase (we don't determine that). We might need to change the language here.

@mrocklin
Copy link
Member

This is where commit-by-commit reviewing becomes challenging. I'm going to ask that you rebase to keep reviewable chunks atomic. (separate PRs would also be welcome of course).

@stsievert
Copy link
Member Author

This is where commit-by-commit reviewing becomes challenging. I'm going to ask that you rebase to keep reviewable chunks atomic. (separate PRs would also be welcome of course).

I see. I'll try to split into separate 3 PRs:

  1. API improvements (cv_results_, renaming, model_history_, etc).
  2. Documentation changes
  3. Bugs fixes/cleaning stop on plateau

How does that sound? Should the PRs be finer grained?

@stsievert
Copy link
Member Author

Superseded by #404, #405, #406. Dependency tree:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants