Parallelization: switch from multiprocessing to joblib #137

Ziqi-Li · 2023-12-03T15:00:26Z

This PR replaces the repvious multiprocessing.pool based parallelization to joblib based. Users only need to specify the n_jobs parameter instead of passing a pool.

Example use:
GWR

n_jobs = 8
bw = Sel_BW(coords, y, X, n_jobs= n_jobs).search()
GWR(coords, y, X, bw, n_jobs=n_jobs)

MGWR

n_jobs = -1 #use all
mgwr_selector = Sel_BW(coords, y, X, multi=True, n_jobs=n_jobs)
mgwr_bw = mgwr_selector.search()
mgwr_results = MGWR(coords, y, X, selector=mgwr_selector, n_jobs=n_jobs).fit()

The test sets are updated to reflect the new interface. The notebook example is also updated and the effectiveness can be compared here: joblib-based (new) vs. mp-based (old)

martinfleis

Sorry for being late but I made two comments in the code.

At least the one on hard-breaking change would be good to resolve imho.

martinfleis · 2023-12-04T18:18:06Z

mgwr/gwr.py

@@ -86,6 +87,10 @@ class GWR(GLM):
    name_x        : list of strings
                    Names of independent variables for use in output

+    n_jobs        : integer
+                    The number of jobs (default 1) to run in parallel. -1 means using all processors.


I know I am coming late to the party but would you consider using -1 as a default? That is quite common across ML world and it is what users generally expect.

i was curious what others' opinion was on this... i tend to default to -1 personally, but joblib itself [indirectly] defaults to 1, and in esda we do both (join counts are conservative, G defaults to -1, etc)

i think scikit usually defaults to 1 so i dont think there's a standard expectation in ML world

I don't mind either as long as it is documented (which it is). But for heavily parallelisable code like this one, I tend to prefer parallel execution by default.

Yeah, I was only looking at scikit-learn and adapts to what they have. I actually personally prefer -1 as the default.

martinfleis · 2023-12-04T18:20:10Z

mgwr/gwr.py

@@ -285,7 +291,7 @@ def _local_fit(self, i):
            return influ, resid, predy, betas.reshape(-1), w, Si, tr_STS_i, CCT

    def fit(self, ini_params=None, tol=1.0e-5, max_iter=20, solve='iwls',
-            lite=False, pool=None):


This is a hard-breaking change we should avoid. I suggest keeping the keyword and warning when it is not None.

martinfleis · 2023-12-04T18:26:10Z

Another note, looking at the notebook. The section Effectivenss of n_jobs has a typo in the title but more importantly, can be misleading. The curve applies to a machine that likely has 8 effective cores and would look very differently on one with more or less. I think it deserves at least a line of a comment pointing this out.

mgwr/gwr.py

Ziqi-Li · 2023-12-04T18:38:07Z

Another note, looking at the notebook. The section Effectivenss of n_jobs has a typo in the title but more importantly, can be misleading. The curve applies to a machine that likely has 8 effective cores and would look very differently on one with more or less. I think it deserves at least a line of a comment pointing this out.

Nice catch on the typo. Will fix it.

I think the curve looks like this because 1) the data is still small 2) the linear part of the computation limits it from further scaling. Here the curves are just showing the parallelization works, the specific scalability depends on so many factors.

martinfleis · 2023-12-04T18:40:42Z

The linear part of the computation limits it from further scaling

Only because of the number of available cores, no? If you ran the same code on a 32 core CPU, the minimum would be at 31-32, not 8 like here.

Ziqi-Li · 2023-12-04T18:54:27Z

The linear part of the computation limits it from further scaling

Only because of the number of available cores, no? If you ran the same code on a 32 core CPU, the minimum would be at 31-32, not 8 like here.

I forgot to mention I have 12 physical cores on the machine. Will you suggest to add a note in notebook saying this is based on a 12-core machine.

martinfleis · 2023-12-04T19:04:50Z

Will you suggest to add a note in notebook saying this is based on a 12-core machine.

Yes, that is what I meant.

It is interesting that it does not scale linearly to 12 cores...

Ziqi-Li added 5 commits December 2, 2023 16:39

Replace multiprocessing with joblib

ae68eae

Replace multiprocessing with joblib

3d5ce0f

Update test_parallel.py

5e0bfe3

update to joblib

82c1ad5

update parallel example notebook

5dbac32

Ziqi-Li requested review from ljwolf, knaaptime, weikang9009, TaylorOshan, jGaboardi and martinfleis December 3, 2023 15:00

Ziqi-Li mentioned this pull request Dec 3, 2023

Switching from multiprocessing to joblib #136

Closed

jGaboardi approved these changes Dec 3, 2023

View reviewed changes

jGaboardi assigned Ziqi-Li Dec 3, 2023

jGaboardi added the enhancement New feature or request label Dec 3, 2023

knaaptime approved these changes Dec 4, 2023

View reviewed changes

TaylorOshan approved these changes Dec 4, 2023

View reviewed changes

TaylorOshan merged commit 524f346 into pysal:master Dec 4, 2023
6 checks passed

martinfleis reviewed Dec 4, 2023

View reviewed changes

mgwr/gwr.py Show resolved Hide resolved

martinfleis mentioned this pull request Dec 4, 2023

Joblib update to default -1 #138

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelization: switch from multiprocessing to joblib #137

Parallelization: switch from multiprocessing to joblib #137

Ziqi-Li commented Dec 3, 2023 •

edited

Loading

martinfleis left a comment

martinfleis Dec 4, 2023

knaaptime Dec 4, 2023

knaaptime Dec 4, 2023

martinfleis Dec 4, 2023

Ziqi-Li Dec 4, 2023

martinfleis Dec 4, 2023

martinfleis commented Dec 4, 2023

Ziqi-Li commented Dec 4, 2023

martinfleis commented Dec 4, 2023

Ziqi-Li commented Dec 4, 2023

martinfleis commented Dec 4, 2023

Parallelization: switch from multiprocessing to joblib #137

Parallelization: switch from multiprocessing to joblib #137

Conversation

Ziqi-Li commented Dec 3, 2023 • edited Loading

martinfleis left a comment

Choose a reason for hiding this comment

martinfleis Dec 4, 2023

Choose a reason for hiding this comment

knaaptime Dec 4, 2023

Choose a reason for hiding this comment

knaaptime Dec 4, 2023

Choose a reason for hiding this comment

martinfleis Dec 4, 2023

Choose a reason for hiding this comment

Ziqi-Li Dec 4, 2023

Choose a reason for hiding this comment

martinfleis Dec 4, 2023

Choose a reason for hiding this comment

martinfleis commented Dec 4, 2023

Ziqi-Li commented Dec 4, 2023

martinfleis commented Dec 4, 2023

Ziqi-Li commented Dec 4, 2023

martinfleis commented Dec 4, 2023

Ziqi-Li commented Dec 3, 2023 •

edited

Loading