Fix the link in dask documentation #479

MichaelSchroter · 2019-03-10T23:02:54Z

Hi All,

I have found a page that would be similar to the missing link in #478. You could find it here.

Hope this is of any value.

Thanks

Michael

Currently builds are using an older theme with some errors

* TST: Try numba RC * Remove RC

* Remove remaining notebooks * Updated examples

* DOC: Added IncrementalSearch to the api docs

Closes #376

* Adds pip upgrade to CI * Set max version number for testpath * Format with new release 18.9b0 of black * Add LogisticRegression solver to fix docs build * Removes filterwarnings from setup.cfg

* Support dataframes for k-means Fixes #390

* Support dataframes in _partial.py::fit/predict Previously these functions would fail on dask dataframes. Now they coerce to dask arrays, and predict also converts back

* Don't use auto chunking with unknown chunk sizes * add test

Previously we would pass around Estimator.predict methods. These methods are opaque to serialization heuristics used in dask.distributed that are used to determine what should move and how to serialize it. Now we pass around bare functions that take in estimators as parameters. * switch out transform as well

* Allow compute=False in ParallelPostFit.score * cleanup tests

and change it in the docs too

* Rename history_results_ => history_ * Provide complete model history, and make it public (otherwise boilerplate needed to formulate model_history_ from history_, looping over items in history and putting in dict, {model_id: hist})

This mirrors scikit-learn's cv_results_, with a one important distinction: this implementation only test on 1 training set. This means that there's a `test_score` key, not `mean_test_score`, or `test_score0`.

Before, BaseIncrementalSearchCV assumed _additional_calls returned one model and returned that to the user. Now, BaseIncrementalSearchCV chooses the model with the highest score returned by _additional_calls. This matters if desired to do a random search, or if `max_iter` is hit.

* MAINT: cleaner separation with _adapt and _stop_on_plateau functions (separates complex adaptive algorithm and stopping on plateau, and allows for overwriting _adapt for other adaptive algorithms that want to stop on plateau) * TST: implement tests for patience and tolerance parameters * MAINT: define "patience" to be the number of partial_fit calls, not the number of score calls

remove failing tests

MAINT: add distributed as a dependency

Change `da.atop` that has been replaced by `da.blockwise` in

Add CI job for oldest supported dependencies

Minor updates to .gitignore

* Add drop option to OneHotEncoder * Update QuantileTransformer internals * Fix commented out code * Remove print lines in test * Add sklearn version check for OneHotEncoder * Add allowed tolerance for QuantileTransformer test * Update OneHotEncoder drop sklearn version to 0.21.0 * Increase test data size for TestQuantileTransformer * Increase QuantileTransformer test coverage * Include transform in test

* update indexable() to just yield dask dataframes, as mentioned in issue #324

* Fix `high is out of bounds for int32` for k_means Fixes #378

TomAugspurger · 2019-03-11T10:59:20Z

Looks like there are unrelated commits in this PR. You probably need to fetch upstream, merge master, and push again.

…

On Sun, Mar 10, 2019 at 6:02 PM MichaelSchroter ***@***.***> wrote: Hi All, I have found a page that would be similar to the missing link in #478 <#478>. You could find it here <https://scikit-learn.org/0.15/modules/scaling_strategies.html>. Hope this is of any value. Thanks Michael ------------------------------ You can view, comment on, or merge this pull request online at: #479 Commit Summary - update dask-sphinx-theme (#361) - TST: Try numba RC (#363) - Update README.rst - RLS: 0.10.0 - ENH: IncrementalSearch (#356) - Examples update (#369) - Add IncrementalSearch to api.rst (#371) - Adds sklearn version check for ColumnTransformer import (#374) - [skip ci] Change links to dask.org (#375) - Auto-rechunk input arrays (#377) - Fix CI issues (#382) - dask.pydata.org -> dask.org - Update joblib documentation for scikit-learn 0.20 (#387) - Bump sklearn min version to 0.20 (#392) - Support dataframes for k-means (#393) - Fixes ShuffleSplit random seed generation bug (#381) - Support dataframes in _partial.py::fit/predict (#395) - Don't use auto chunking with unknown chunk sizes (#398) - Pass models in predict rather than their methods (#400) - Allow compute=False in ParallelPostFit.score (#402) - API: rename IncrementalSearch => IncrementalSearchCV - API: rename history_results_, and format differently - API: add cv_results_ - MAINT: allow _additional_calls to return multiple models - [MRG+1] Poly trans: Issue #347 (#367) - TST, MAINT: clean stopping on plateau (see notes below) - BUG: ∞ loop in IncrementalSearchCV if decay_rate=0 - TST: perform basic search (decay_rate=0) in test_search_basic - MAINT: collapse _adapt and _stop_on_plateau into one function - Fix test formatting - Closes #385 (#407) - DOC: update changelog - Use scipy to rank - Replace cv_results asserts with sanity checks - IncrementalSearch edge cases (#373) - Filter warning from dask dataframe concat (#408) - Merge branch 'search-api' into search-bug - TST: if passive, return highest scoring model else sanity checks - API: improve and clean IncrementalSearch API (#404) - DOC: IncrementalSearchCV (#405) - Merge remote-tracking branch 'upstream/master' into stsievert-search-bug - Merge remote-tracking branch 'upstream/master' into whatsnew - lint - fix link - DOC: update changelog (#409) - failing test - maybe fix? - Merge pull request #411 from TomAugspurger/decay-loop - Merge remote-tracking branch 'upstream/master' into stsievert-search-bug - run tests - Merge pull request #406 from stsievert/search-bug - doc fixup - Merge pull request #413 from TomAugspurger/doc-fixup - RLS: 0.11.0 - typo [skip ci] - Merge pull request #414 from TomAugspurger/typo - Replace get with scheduler - Roll back changes to test_scheduler_param - try pinning CPython - fix warnings - Specify dask array chunksizes - Merge pull request #418 from jrbourbeau/fix_get_tests - Bug-Fix in Polynomial-Features - make transformer params more general - Minor LogisticRegression updates - Merge pull request #422 from jrbourbeau/logistic_reg_cleanup - Merge pull request #417 from datajanko/poly-bug-fix - API: Lazy score, predict for IncrementalSearchCV - lint - Add version number to conf.py - Use X.Y.Z version format - Merge pull request #426 from jrbourbeau/docs_version - Fix typo in docstring - Merge pull request #429 from rmsare/docs-split-typo - Bug: Handle a value not being passed for Y in euclidian distance. - RFC: Adjusts regex test - STY: Flake8 - RFC: Adjusts for pytest 4.0.0 - BUG: Changes order for make_column_transformer - format conf.py - ignore conf.py - RFC: Comment out version - RFC: Resolves FutureWarning - Merge pull request #431 from thomasjpfan/issue/427 - allow_unknown_chunksizes=True in dask_ml.compose.ColumnTransformer._hstack - Use high-level graphs - Include our dsk in the graphs - avoid assert_true - Changes for PR 437 comments. - Trigger CI - Merge pull request #439 from TomAugspurger/ci-fix - Merge remote-tracking branch 'upstream/master' into ZEFR-INC-master - black formatting - isort - removed dead code path, updated test to provide confirmation. - updated test with sklearn 1-D check - Merge pull request #437 from ZEFR-INC/master - COMPAT: fix warning - Merge pull request #441 from TomAugspurger/collections-warning - Merge remote-tracking branch 'upstream/master' into asgersoerensen-patch-1 - Added test for no Y - Merge remote-tracking branch 'upstream/master' into ppf-incsearchcv - lint - Merge pull request #430 from asgersoerensen/patch-1 - Merge pull request #424 from TomAugspurger/ppf-incsearchcv - fix typo: model_selectoin -> model_selection (#442) - Update preprocessing.rst - Merge pull request #445 from teoguso/master - Updated joblib.rst - Decrease n_splits - Add SparseDtype case to test - Merge pull request #450 from jrbourbeau/fix-sklearn-dev - Merge pull request #449 from suamin/patch-1 - Fix input to sharedict.merge - Merge pull request #455 from jrbourbeau/fix_partial_fit - Add graphviz to dev environment - Merge pull request #459 from jrbourbeau/add_graphviz - Add oldest_supported CircleCI job - Add conda list - Add package uninstalls - Update minimum versions - Fix pandas SparseDtype failures - "oldest supported" -> "earliest supported" - Remove defaults conda channel - Uninstall pypi numpy - Remove type comments - Update earliest scikit-learn from 0.20 to 0.20.0 - MAINT: loud warning if ImportError with IncrementalSearchCV - TST: add circleci test for no distributed - Revert "TST: add circleci test for no distributed" - Pin NumPy version to avoid FutureWarnings in sklearn - Update test_column_transformer - TST: rework tests to standalone file - Typo - Change deprecated `da.atop` to `da.blockwise` - Revert trying to make distributed a soft dependency - MAINT: add distributed as a dependency - MAINT: require specific version - Change `da.atop` for `da.blockwise` in data.py - remove failing tests - Merge pull request #469 from TomAugspurger/test-fixup - Merge branch 'master' into import-incremental - Merge branch 'master' into patch-1 - Merge pull request #466 from stsievert/import-incremental - Merge pull request #468 from jjerphan/patch-1 - Merge remote-tracking branch 'upstream/master' into update_ci - Fix blockwise failing tests - Add defaults channel back - Move blockwise to _compat - Sort imports with isort v4.3.8 - Merge pull request #461 from jrbourbeau/update_ci - Minor updates to .gitignore [skip ci] - Merge pull request #472 from jrbourbeau/update_gitignore - RLS: 0.12.0 - Fix sklearn dev tests (#474) - Fix imports for isort 4.3.10 (#476) - update indexable() to just yield dask dataframes (issue #324) (#471) - Fix #378: high is out of bounds for int32 for k_means (#462) - Use scikit-learn nightly wheels (#477) File Changes - *M* .circleci/config.yml <https://github.com/dask/dask-ml/pull/479/files#diff-0> (39) - *M* .gitignore <https://github.com/dask/dask-ml/pull/479/files#diff-1> (2) - *M* .travis.yml <https://github.com/dask/dask-ml/pull/479/files#diff-2> (1) - *M* README.rst <https://github.com/dask/dask-ml/pull/479/files#diff-3> (4) - *M* ci/environment-2.7.yml <https://github.com/dask/dask-ml/pull/479/files#diff-4> (11) - *M* ci/environment-3.6.yml <https://github.com/dask/dask-ml/pull/479/files#diff-5> (17) - *M* ci/install-circle.sh <https://github.com/dask/dask-ml/pull/479/files#diff-6> (1) - *M* dask_ml/__init__.py <https://github.com/dask/dask-ml/pull/479/files#diff-7> (2) - *M* dask_ml/_compat.py <https://github.com/dask/dask-ml/pull/479/files#diff-8> (15) - *M* dask_ml/_partial.py <https://github.com/dask/dask-ml/pull/479/files#diff-9> (20) - *M* dask_ml/_utils.py <https://github.com/dask/dask-ml/pull/479/files#diff-10> (16) - *M* dask_ml/cluster/__init__.py <https://github.com/dask/dask-ml/pull/479/files#diff-11> (2) - *M* dask_ml/cluster/k_means.py <https://github.com/dask/dask-ml/pull/479/files#diff-12> (18) - *M* dask_ml/cluster/spectral.py <https://github.com/dask/dask-ml/pull/479/files#diff-13> (6) - *M* dask_ml/compose/_column_transformer.py <https://github.com/dask/dask-ml/pull/479/files#diff-14> (24) - *M* dask_ml/decomposition/pca.py <https://github.com/dask/dask-ml/pull/479/files#diff-15> (11) - *M* dask_ml/decomposition/truncated_svd.py <https://github.com/dask/dask-ml/pull/479/files#diff-16> (2) - *M* dask_ml/feature_extraction/text.py <https://github.com/dask/dask-ml/pull/479/files#diff-17> (16) - *M* dask_ml/impute.py <https://github.com/dask/dask-ml/pull/479/files#diff-18> (10) - *M* dask_ml/linear_model/__init__.py <https://github.com/dask/dask-ml/pull/479/files#diff-19> (6) - *M* dask_ml/linear_model/glm.py <https://github.com/dask/dask-ml/pull/479/files#diff-20> (30) - *M* dask_ml/metrics/__init__.py <https://github.com/dask/dask-ml/pull/479/files#diff-21> (7) - *M* dask_ml/metrics/pairwise.py <https://github.com/dask/dask-ml/pull/479/files#diff-22> (22) - *M* dask_ml/metrics/regression.py <https://github.com/dask/dask-ml/pull/479/files#diff-23> (2) - *M* dask_ml/metrics/scorer.py <https://github.com/dask/dask-ml/pull/479/files#diff-24> (18) - *M* dask_ml/model_selection/__init__.py <https://github.com/dask/dask-ml/pull/479/files#diff-25> (13) - *M* dask_ml/model_selection/_incremental.py <https://github.com/dask/dask-ml/pull/479/files#diff-26> (575) - *M* dask_ml/model_selection/_search.py <https://github.com/dask/dask-ml/pull/479/files#diff-27> (140) - *M* dask_ml/model_selection/_split.py <https://github.com/dask/dask-ml/pull/479/files#diff-28> (18) - *M* dask_ml/model_selection/methods.py <https://github.com/dask/dask-ml/pull/479/files#diff-29> (3) - *M* dask_ml/model_selection/utils.py <https://github.com/dask/dask-ml/pull/479/files#diff-30> (53) - *M* dask_ml/model_selection/utils_test.py <https://github.com/dask/dask-ml/pull/479/files#diff-31> (8) - *M* dask_ml/naive_bayes.py <https://github.com/dask/dask-ml/pull/479/files#diff-32> (2) - *M* dask_ml/preprocessing/__init__.py <https://github.com/dask/dask-ml/pull/479/files#diff-33> (25) - *M* dask_ml/preprocessing/_encoders.py <https://github.com/dask/dask-ml/pull/479/files#diff-34> (28) - *M* dask_ml/preprocessing/data.py <https://github.com/dask/dask-ml/pull/479/files#diff-35> (114) - *M* dask_ml/preprocessing/label.py <https://github.com/dask/dask-ml/pull/479/files#diff-36> (1) - *M* dask_ml/wrappers.py <https://github.com/dask/dask-ml/pull/479/files#diff-37> (141) - *M* docs/source/changelog.rst <https://github.com/dask/dask-ml/pull/479/files#diff-38> (31) - *M* docs/source/conf.py <https://github.com/dask/dask-ml/pull/479/files#diff-39> (24) - *M* docs/source/examples.rst <https://github.com/dask/dask-ml/pull/479/files#diff-40> (35) - *A* docs/source/examples/.gitignore <https://github.com/dask/dask-ml/pull/479/files#diff-41> (0) - *D* docs/source/examples/tensorflow.ipynb <https://github.com/dask/dask-ml/pull/479/files#diff-42> (410) - *D* docs/source/examples/text-vectorization.ipynb <https://github.com/dask/dask-ml/pull/479/files#diff-43> (202) - *M* docs/source/hyper-parameter-search.rst <https://github.com/dask/dask-ml/pull/479/files#diff-44> (112) - *M* docs/source/incremental.rst <https://github.com/dask/dask-ml/pull/479/files#diff-45> (25) - *M* docs/source/index.rst <https://github.com/dask/dask-ml/pull/479/files#diff-46> (20) - *M* docs/source/joblib.rst <https://github.com/dask/dask-ml/pull/479/files#diff-47> (70) - *M* docs/source/modules/api.rst <https://github.com/dask/dask-ml/pull/479/files#diff-48> (18) - *A* docs/source/modules/generted/dask_ml.compose.ColumnTransformer.rst <https://github.com/dask/dask-ml/pull/479/files#diff-49> (16) - *A* docs/source/modules/generted/dask_ml.compose.make_column_transformer.rst <https://github.com/dask/dask-ml/pull/479/files#diff-50> (6) - *M* docs/source/preprocessing.rst <https://github.com/dask/dask-ml/pull/479/files#diff-51> (5) - *M* setup.cfg <https://github.com/dask/dask-ml/pull/479/files#diff-52> (6) - *M* setup.py <https://github.com/dask/dask-ml/pull/479/files#diff-53> (7) - *M* tests/compose/test_column_transformer.py <https://github.com/dask/dask-ml/pull/479/files#diff-54> (152) - *M* tests/linear_model/test_stochastic_gradient.py <https://github.com/dask/dask-ml/pull/479/files#diff-55> (2) - *M* tests/metrics/test_metrics.py <https://github.com/dask/dask-ml/pull/479/files#diff-56> (25) - *M* tests/model_selection/dask_searchcv/test_model_selection.py <https://github.com/dask/dask-ml/pull/479/files#diff-57> (9) - *M* tests/model_selection/dask_searchcv/test_model_selection_sklearn.py <https://github.com/dask/dask-ml/pull/479/files#diff-58> (62) - *M* tests/model_selection/test_incremental.py <https://github.com/dask/dask-ml/pull/479/files#diff-59> (290) - *M* tests/model_selection/test_split.py <https://github.com/dask/dask-ml/pull/479/files#diff-60> (26) - *M* tests/preprocessing/test_data.py <https://github.com/dask/dask-ml/pull/479/files#diff-61> (108) - *M* tests/preprocessing/test_encoders.py <https://github.com/dask/dask-ml/pull/479/files#diff-62> (36) - *M* tests/test_impute.py <https://github.com/dask/dask-ml/pull/479/files#diff-63> (20) - *M* tests/test_incremental.py <https://github.com/dask/dask-ml/pull/479/files#diff-64> (12) - *M* tests/test_kmeans.py <https://github.com/dask/dask-ml/pull/479/files#diff-65> (15) - *M* tests/test_parallel_post_fit.py <https://github.com/dask/dask-ml/pull/479/files#diff-66> (42) - *M* tests/test_partial.py <https://github.com/dask/dask-ml/pull/479/files#diff-67> (25) - *M* tests/test_pca.py <https://github.com/dask/dask-ml/pull/479/files#diff-68> (52) Patch Links: - https://github.com/dask/dask-ml/pull/479.patch - https://github.com/dask/dask-ml/pull/479.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#479>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIsGdUoeUomOwqctX4ASKBX2a5Ciwks5vVY8ggaJpZM4bnkAA> .

TomAugspurger · 2019-03-15T20:18:30Z

Superseded by #483

mrocklin and others added 30 commits September 10, 2018 08:11

update dask-sphinx-theme (#361)

48d70f5

Currently builds are using an older theme with some errors

TST: Try numba RC (#363)

03778cc

* TST: Try numba RC * Remove RC

Update README.rst

ff84063

RLS: 0.10.0

ec8caf8

ENH: IncrementalSearch (#356)

991b894

Examples update (#369)

5769041

* Remove remaining notebooks * Updated examples

Add IncrementalSearch to api.rst (#371)

b5b6954

* DOC: Added IncrementalSearch to the api docs

Adds sklearn version check for ColumnTransformer import (#374)

9dc2ea8

[skip ci] Change links to dask.org (#375)

37eca7d

Auto-rechunk input arrays (#377)

3260747

Closes #376

Fix CI issues (#382)

61e8786

* Adds pip upgrade to CI * Set max version number for testpath * Format with new release 18.9b0 of black * Add LogisticRegression solver to fix docs build * Removes filterwarnings from setup.cfg

dask.pydata.org -> dask.org

0310a90

Update joblib documentation for scikit-learn 0.20 (#387)

f8acdef

Bump sklearn min version to 0.20 (#392)

76a6482

Support dataframes for k-means (#393)

6c919ed

* Support dataframes for k-means Fixes #390

Fixes ShuffleSplit random seed generation bug (#381)

90fbe69

Support dataframes in _partial.py::fit/predict (#395)

dbd9b01

* Support dataframes in _partial.py::fit/predict Previously these functions would fail on dask dataframes. Now they coerce to dask arrays, and predict also converts back

Don't use auto chunking with unknown chunk sizes (#398)

a7276ae

* Don't use auto chunking with unknown chunk sizes * add test

Allow compute=False in ParallelPostFit.score (#402)

939ea07

* Allow compute=False in ParallelPostFit.score * cleanup tests

API: rename IncrementalSearch => IncrementalSearchCV

8005b75

and change it in the docs too

API: rename history_results_, and format differently

0335087

* Rename history_results_ => history_ * Provide complete model history, and make it public (otherwise boilerplate needed to formulate model_history_ from history_, looping over items in history and putting in dict, {model_id: hist})

API: add cv_results_

66da937

This mirrors scikit-learn's cv_results_, with a one important distinction: this implementation only test on 1 training set. This means that there's a `test_score` key, not `mean_test_score`, or `test_score0`.

[MRG+1] Poly trans: Issue #347 (#367)

ac2fdb7

BUG: ∞ loop in IncrementalSearchCV if decay_rate=0

8c7153d

TST: perform basic search (decay_rate=0) in test_search_basic

b5ce47c

MAINT: collapse _adapt and _stop_on_plateau into one function

c73f79c

Fix test formatting

ead5f31

stsievert and others added 26 commits February 19, 2019 21:17

Typo

159ee67

Change deprecated da.atop to da.blockwise

87efad4

Revert trying to make distributed a soft dependency

3574b0d

MAINT: add distributed as a dependency

b550ef2

MAINT: require specific version

8d9bbfa

Change da.atop for da.blockwise in data.py

edf1d1d

remove failing tests

18f9105

Merge pull request #469 from TomAugspurger/test-fixup

f4a570f

remove failing tests

Merge branch 'master' into import-incremental

66c3c52

Merge branch 'master' into patch-1

1dfbd0f

Merge pull request #466 from stsievert/import-incremental

0b7bc8a

MAINT: add distributed as a dependency

Merge pull request #468 from jjerphan/patch-1

7e87348

Change `da.atop` that has been replaced by `da.blockwise` in

Merge remote-tracking branch 'upstream/master' into update_ci

f61a12b

Fix blockwise failing tests

a3bbda9

Add defaults channel back

1d716be

Move blockwise to _compat

e262c25

Sort imports with isort v4.3.8

743aa61

Merge pull request #461 from jrbourbeau/update_ci

beca674

Add CI job for oldest supported dependencies

Minor updates to .gitignore [skip ci]

cb371c5

Merge pull request #472 from jrbourbeau/update_gitignore

1058ab4

Minor updates to .gitignore

RLS: 0.12.0

e1b21e6

Fix imports for isort 4.3.10 (#476)

b3530da

update indexable() to just yield dask dataframes (issue #324) (#471)

7c601ad

* update indexable() to just yield dask dataframes, as mentioned in issue #324

Fix #378: high is out of bounds for int32 for k_means (#462)

3883bfd

* Fix `high is out of bounds for int32` for k_means Fixes #378

Use scikit-learn nightly wheels (#477)

d68c7fa

DOC: Documentation link error (#483)

ce81ad1

TomAugspurger closed this Mar 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the link in dask documentation #479

Fix the link in dask documentation #479

MichaelSchroter commented Mar 10, 2019

TomAugspurger commented Mar 11, 2019 via email

TomAugspurger commented Mar 15, 2019

Fix the link in dask documentation #479

Fix the link in dask documentation #479

Conversation

MichaelSchroter commented Mar 10, 2019

TomAugspurger commented Mar 11, 2019 via email

TomAugspurger commented Mar 15, 2019