You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the ProcessPoolExecutor in core._StatsForecast._forecast_parallel does not set the mp_context parameter AND offers no direct way of doing so. On Linux, therefore, the default method fork is always used. This may lead to totally surprising lockups of programs using threads and no clean way to solve such issues when they occur.
The polars library uses threads extensively and explicitly warns again fork in their docs. Their recent 1.14 release added an explicit warning which (for the above reason) is always shown when using polars alongside StatsForecast with n_jobs>=2.
from datetime import date
import polars as pl
import statsforecast.core
import statsforecast.models
df = pl.DataFrame(
{
"unique_id": ["a", "a", "a", "b", "b", "b"],
"ds": (date(2000, i, 1) for i in (1, 2, 3, 1, 2, 3)),
"y": range(6),
},
schema_overrides={"value": pl.Float64},
)
sf = statsforecast.core.StatsForecast(
models=[statsforecast.models.Naive()],
freq="1mo",
n_jobs=2,
)
sf.forecast(h=12, df=df)
generates
RuntimeWarning: Using fork() can cause Polars to deadlock in the child process.
In addition, using fork() with Python in general is a recipe for mysterious
deadlocks and crashes.
The most likely reason you are seeing this error is because you are using the
multiprocessing module on Linux, which uses fork() by default. This will be
fixed in Python 3.14. Until then, you want to use the "spawn" context instead.
See https://docs.pola.rs/user-guide/misc/multiprocessing/ for details.
on Linux.
Of course, one can hack around this, e.g.,
import concurrent.futures
import multiprocessing
from datetime import date
from unittest.mock import patch
import polars as pl
import statsforecast.core
import statsforecast.models
df = pl.DataFrame(
{
"unique_id": ["a", "a", "a", "b", "b", "b"],
"ds": (date(2000, i, 1) for i in (1, 2, 3, 1, 2, 3)),
"y": range(6),
},
schema_overrides={"value": pl.Float64},
)
def generate_pool(n_jobs: int) -> concurrent.futures.ProcessPoolExecutor:
return concurrent.futures.ProcessPoolExecutor(
max_workers=n_jobs, mp_context=multiprocessing.get_context("spawn")
)
sf = statsforecast.core.StatsForecast(
models=[statsforecast.models.Naive()],
freq="1mo",
n_jobs=2,
)
with patch("statsforecast.core.ProcessPoolExecutor", generate_pool):
sf.forecast(h=12, df=df)
but that is clearly not optimal.
Setting the context to "spawn" has some overhead and there might be cases where people might want to stick with "fork"; hence, having a parameter would be nice.
Use case
Using statsforecast alongside current polars versions.
The text was updated successfully, but these errors were encountered:
Hey @christian-hnz, thanks for the detailed report. We only use dicts of numpy arrays in multiprocessing, so I don't think this would cause any problems with polars. Did you experience any issues (apart from polars' warning)?
Seems like you can set the start method in your notebook/script with multiprocessing.set_start_method. If you do that then our ProcessPoolExecutor should be able to pick that up.
No, I actually did not run into real issues using polars with statsforecast (and I've actually done quite heavy jobs, so rather confident there is not real issue here). Also, a warnings.filterwarnings is enough to deal with the newly added polars fork warning.
True, multiprocessing.set_start_method is an option but not an optional choice outside scripting.
Description
Currently, the
ProcessPoolExecutor
incore._StatsForecast._forecast_parallel
does not set themp_context
parameter AND offers no direct way of doing so. On Linux, therefore, the default methodfork
is always used. This may lead to totally surprising lockups of programs using threads and no clean way to solve such issues when they occur.The polars library uses threads extensively and explicitly warns again
fork
in their docs. Their recent 1.14 release added an explicit warning which (for the above reason) is always shown when using polars alongsideStatsForecast
withn_jobs>=2
.Python will switch to
fork
as default in Python 3.14.Here is a minimal example.
generates
on Linux.
Of course, one can hack around this, e.g.,
but that is clearly not optimal.
Setting the context to "spawn" has some overhead and there might be cases where people might want to stick with "fork"; hence, having a parameter would be nice.
Use case
Using
statsforecast
alongside currentpolars
versions.The text was updated successfully, but these errors were encountered: