Skip to content

Commit b1290cb

Browse files
Feat/historical_forecasts accept negative integer as start value (#1866)
* feat: historical_foreacst accept negative integer as start value * fix: improved the negative start unit test * fix: simplified the logic around exception raising * fix: instead of adding capabilities to get_index_at_point, use a new argument in historical_forecasts. Updated exception accordingly * test: udpated tests accordingly * doc: updated changelog * test: added test for historical forecast on ts using a rangeindex starting with a negative value * Apply suggestions from code review Co-authored-by: Dennis Bader <[email protected]> * fix: changed the literal to 'positional_index' and 'value_index' * feat: making the error messages more informative, adapted the tests accordingly * feat: extending the new argument to backtest and gridsearch * fix: import of Literal for python 3.8 * doc: updated changelog * fix: shortened the literal for start_format, updated tests accordingly * doc: updated start docstring * test: limited the dependency on unittest in anticipation of the refactoring * doc: updated changelog * fix: fixed typo * fix: fixed typo * doc: copy start and start_format docstring from hist_fct to backtest and gridsearch * Apply suggestions from code review Co-authored-by: Dennis Bader <[email protected]> --------- Co-authored-by: Dennis Bader <[email protected]>
1 parent b69b8ca commit b1290cb

File tree

7 files changed

+243
-68
lines changed

7 files changed

+243
-68
lines changed

CHANGELOG.md

+4
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
1010

1111
### For users of the library:
1212

13+
**Improved**
14+
- `TimeSeries` with a `RangeIndex` starting in the negative start are now supported by `historical_forecasts`. [#1866](https://github.com/unit8co/darts/pull/1866) by [Antoine Madrona](https://github.com/madtoinou).
15+
- Added a new argument `start_format` to `historical_forecasts()`, `backtest()` and `gridsearch` that allows to use an integer `start` either as the index position or index value/label for `series` indexed with a `pd.RangeIndex`. [#1866](https://github.com/unit8co/darts/pull/1866) by [Antoine Madrona](https://github.com/madtoinou).
16+
1317
**Fixed**
1418
- Fixed a bug in `TimeSeries.from_dataframe()` when using a pandas.DataFrame with `df.columns.name != None`. [#1938](https://github.com/unit8co/darts/pull/1938) by [Antoine Madrona](https://github.com/madtoinou).
1519
- Fixed a bug in `RegressionEnsembleModel.extreme_lags` when the forecasting models have only covariates lags. [#1942](https://github.com/unit8co/darts/pull/1942) by [Antoine Madrona](https://github.com/madtoinou).

darts/models/forecasting/forecasting_model.py

+79-33
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,11 @@
2424
from random import sample
2525
from typing import Any, BinaryIO, Callable, Dict, List, Optional, Sequence, Tuple, Union
2626

27+
try:
28+
from typing import Literal
29+
except ImportError:
30+
from typing_extensions import Literal
31+
2732
import numpy as np
2833
import pandas as pd
2934

@@ -560,6 +565,7 @@ def historical_forecasts(
560565
num_samples: int = 1,
561566
train_length: Optional[int] = None,
562567
start: Optional[Union[pd.Timestamp, float, int]] = None,
568+
start_format: Literal["position", "value"] = "value",
563569
forecast_horizon: int = 1,
564570
stride: int = 1,
565571
retrain: Union[bool, int, Callable[..., bool]] = True,
@@ -609,15 +615,14 @@ def historical_forecasts(
609615
steps available, all steps up until prediction time are used, as in default case. Needs to be at least
610616
`min_train_series_length`.
611617
start
612-
Optionally, the first point in time at which a prediction is computed for a future time.
613-
This parameter supports: ``float``, ``int`` and ``pandas.Timestamp``, and ``None``.
614-
If a ``float``, the parameter will be treated as the proportion of the time series
615-
that should lie before the first prediction point.
616-
If an ``int``, the parameter will be treated as an integer index to the time index of
617-
`series` that will be used as first prediction time.
618-
If a ``pandas.Timestamp``, the time stamp will be used to determine the first prediction time
619-
directly.
620-
If ``None``, the first prediction time will automatically be set to:
618+
Optionally, the first point in time at which a prediction is computed. This parameter supports:
619+
``float``, ``int``, ``pandas.Timestamp``, and ``None``.
620+
If a ``float``, it is the proportion of the time series that should lie before the first prediction point.
621+
If an ``int``, it is either the index position of the first prediction point for `series` with a
622+
`pd.DatetimeIndex`, or the index value for `series` with a `pd.RangeIndex`. The latter can be changed to
623+
the index position with `start_format="position"`.
624+
If a ``pandas.Timestamp``, it is the time stamp of the first prediction point.
625+
If ``None``, the first prediction point will automatically be set to:
621626
622627
- the first predictable point if `retrain` is ``False``, or `retrain` is a Callable and the first
623628
predictable point is earlier than the first trainable point.
@@ -628,6 +633,13 @@ def historical_forecasts(
628633
Note: Raises a ValueError if `start` yields a time outside the time index of `series`.
629634
Note: If `start` is outside the possible historical forecasting times, will ignore the parameter
630635
(default behavior with ``None``) and start at the first trainable/predictable point.
636+
start_format
637+
Defines the `start` format. Only effective when `start` is an integer and `series` is indexed with a
638+
`pd.RangeIndex`.
639+
If set to 'position', `start` corresponds to the index position of the first predicted point and can range
640+
from `(-len(series), len(series) - 1)`.
641+
If set to 'value', `start` corresponds to the index value/label of the first predicted point. Will raise
642+
an error if the value is not in `series`' index. Default: ``'value'``
631643
forecast_horizon
632644
The forecast horizon for the predictions.
633645
stride
@@ -798,6 +810,7 @@ def retrain_func(
798810
future_covariates=future_covariates,
799811
num_samples=num_samples,
800812
start=start,
813+
start_format=start_format,
801814
forecast_horizon=forecast_horizon,
802815
stride=stride,
803816
overlap_end=overlap_end,
@@ -876,6 +889,7 @@ def retrain_func(
876889
forecast_horizon=forecast_horizon,
877890
overlap_end=overlap_end,
878891
start=start,
892+
start_format=start_format,
879893
show_warnings=show_warnings,
880894
)
881895

@@ -1030,6 +1044,7 @@ def backtest(
10301044
num_samples: int = 1,
10311045
train_length: Optional[int] = None,
10321046
start: Optional[Union[pd.Timestamp, float, int]] = None,
1047+
start_format: Literal["position", "value"] = "value",
10331048
forecast_horizon: int = 1,
10341049
stride: int = 1,
10351050
retrain: Union[bool, int, Callable[..., bool]] = True,
@@ -1085,25 +1100,31 @@ def backtest(
10851100
steps available, all steps up until prediction time are used, as in default case. Needs to be at least
10861101
`min_train_series_length`.
10871102
start
1088-
Optionally, the first point in time at which a prediction is computed for a future time.
1089-
This parameter supports: ``float``, ``int`` and ``pandas.Timestamp``, and ``None``.
1090-
If a ``float``, the parameter will be treated as the proportion of the time series
1091-
that should lie before the first prediction point.
1092-
If an ``int``, the parameter will be treated as an integer index to the time index of
1093-
`series` that will be used as first prediction time.
1094-
If a ``pandas.Timestamp``, the time stamp will be used to determine the first prediction time
1095-
directly.
1096-
If ``None``, the first prediction time will automatically be set to:
1097-
- the first predictable point if `retrain` is ``False``, or `retrain` is a Callable and the first
1098-
predictable point is earlier than the first trainable point.
1099-
1100-
- the first trainable point if `retrain` is ``True`` or ``int`` (given `train_length`),
1101-
or `retrain` is a Callable and the first trainable point is earlier than the first predictable point.
1102-
1103-
- the first trainable point (given `train_length`) otherwise
1103+
Optionally, the first point in time at which a prediction is computed. This parameter supports:
1104+
``float``, ``int``, ``pandas.Timestamp``, and ``None``.
1105+
If a ``float``, it is the proportion of the time series that should lie before the first prediction point.
1106+
If an ``int``, it is either the index position of the first prediction point for `series` with a
1107+
`pd.DatetimeIndex`, or the index value for `series` with a `pd.RangeIndex`. The latter can be changed to
1108+
the index position with `start_format="position"`.
1109+
If a ``pandas.Timestamp``, it is the time stamp of the first prediction point.
1110+
If ``None``, the first prediction point will automatically be set to:
1111+
1112+
- the first predictable point if `retrain` is ``False``, or `retrain` is a Callable and the first
1113+
predictable point is earlier than the first trainable point.
1114+
- the first trainable point if `retrain` is ``True`` or ``int`` (given `train_length`),
1115+
or `retrain` is a Callable and the first trainable point is earlier than the first predictable point.
1116+
- the first trainable point (given `train_length`) otherwise
1117+
11041118
Note: Raises a ValueError if `start` yields a time outside the time index of `series`.
11051119
Note: If `start` is outside the possible historical forecasting times, will ignore the parameter
11061120
(default behavior with ``None``) and start at the first trainable/predictable point.
1121+
start_format
1122+
Defines the `start` format. Only effective when `start` is an integer and `series` is indexed with a
1123+
`pd.RangeIndex`.
1124+
If set to 'position', `start` corresponds to the index position of the first predicted point and can range
1125+
from `(-len(series), len(series) - 1)`.
1126+
If set to 'value', `start` corresponds to the index value/label of the first predicted point. Will raise
1127+
an error if the value is not in `series`' index. Default: ``'value'``
11071128
forecast_horizon
11081129
The forecast horizon for the point predictions.
11091130
stride
@@ -1160,6 +1181,7 @@ def backtest(
11601181
num_samples=num_samples,
11611182
train_length=train_length,
11621183
start=start,
1184+
start_format=start_format,
11631185
forecast_horizon=forecast_horizon,
11641186
stride=stride,
11651187
retrain=retrain,
@@ -1210,6 +1232,7 @@ def gridsearch(
12101232
forecast_horizon: Optional[int] = None,
12111233
stride: int = 1,
12121234
start: Union[pd.Timestamp, float, int] = 0.5,
1235+
start_format: Literal["position", "value"] = "value",
12131236
last_points_only: bool = False,
12141237
show_warnings: bool = True,
12151238
val_series: Optional[TimeSeries] = None,
@@ -1275,17 +1298,38 @@ def gridsearch(
12751298
forecast_horizon
12761299
The integer value of the forecasting horizon. Activates expanding window mode.
12771300
stride
1278-
The number of time steps between two consecutive predictions. Only used in expanding window mode.
1301+
Only used in expanding window mode. The number of time steps between two consecutive predictions.
12791302
start
1280-
The ``int``, ``float`` or ``pandas.Timestamp`` that represents the starting point in the time index
1281-
of `series` from which predictions will be made to evaluate the model.
1282-
For a detailed description of how the different data types are interpreted, please see the documentation
1283-
for `ForecastingModel.backtest`. Only used in expanding window mode.
1303+
Only used in expanding window mode. Optionally, the first point in time at which a prediction is computed.
1304+
This parameter supports: ``float``, ``int``, ``pandas.Timestamp``, and ``None``.
1305+
If a ``float``, it is the proportion of the time series that should lie before the first prediction point.
1306+
If an ``int``, it is either the index position of the first prediction point for `series` with a
1307+
`pd.DatetimeIndex`, or the index value for `series` with a `pd.RangeIndex`. The latter can be changed to
1308+
the index position with `start_format="position"`.
1309+
If a ``pandas.Timestamp``, it is the time stamp of the first prediction point.
1310+
If ``None``, the first prediction point will automatically be set to:
1311+
1312+
- the first predictable point if `retrain` is ``False``, or `retrain` is a Callable and the first
1313+
predictable point is earlier than the first trainable point.
1314+
- the first trainable point if `retrain` is ``True`` or ``int`` (given `train_length`),
1315+
or `retrain` is a Callable and the first trainable point is earlier than the first predictable point.
1316+
- the first trainable point (given `train_length`) otherwise
1317+
1318+
Note: Raises a ValueError if `start` yields a time outside the time index of `series`.
1319+
Note: If `start` is outside the possible historical forecasting times, will ignore the parameter
1320+
(default behavior with ``None``) and start at the first trainable/predictable point.
1321+
start_format
1322+
Only used in expanding window mode. Defines the `start` format. Only effective when `start` is an integer
1323+
and `series` is indexed with a `pd.RangeIndex`.
1324+
If set to 'position', `start` corresponds to the index position of the first predicted point and can range
1325+
from `(-len(series), len(series) - 1)`.
1326+
If set to 'value', `start` corresponds to the index value/label of the first predicted point. Will raise
1327+
an error if the value is not in `series`' index. Default: ``'value'``
12841328
last_points_only
1285-
Whether to use the whole forecasts or only the last point of each forecast to compute the error. Only used
1286-
in expanding window mode.
1329+
Only used in expanding window mode. Whether to use the whole forecasts or only the last point of each
1330+
forecast to compute the error.
12871331
show_warnings
1288-
Whether to show warnings related to the `start` parameter. Only used in expanding window mode.
1332+
Only used in expanding window mode. Whether to show warnings related to the `start` parameter.
12891333
val_series
12901334
The TimeSeries instance used for validation in split mode. If provided, this series must start right after
12911335
the end of `series`; so that a proper comparison of the forecast can be made.
@@ -1386,6 +1430,7 @@ def _evaluate_combination(param_combination) -> float:
13861430
future_covariates=future_covariates,
13871431
num_samples=1,
13881432
start=start,
1433+
start_format=start_format,
13891434
forecast_horizon=forecast_horizon,
13901435
stride=stride,
13911436
metric=metric,
@@ -1893,6 +1938,7 @@ def _optimized_historical_forecasts(
18931938
future_covariates: Optional[Sequence[TimeSeries]] = None,
18941939
num_samples: int = 1,
18951940
start: Optional[Union[pd.Timestamp, float, int]] = None,
1941+
start_format: Literal["position", "value"] = "value",
18961942
forecast_horizon: int = 1,
18971943
stride: int = 1,
18981944
overlap_end: bool = False,

darts/models/forecasting/regression_model.py

+8
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,11 @@
2929
from collections import OrderedDict
3030
from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple, Union
3131

32+
try:
33+
from typing import Literal
34+
except ImportError:
35+
from typing_extensions import Literal
36+
3237
import numpy as np
3338
import pandas as pd
3439
from sklearn.linear_model import LinearRegression
@@ -897,6 +902,7 @@ def _optimized_historical_forecasts(
897902
future_covariates: Optional[Sequence[TimeSeries]] = None,
898903
num_samples: int = 1,
899904
start: Optional[Union[pd.Timestamp, float, int]] = None,
905+
start_format: Literal["position", "value"] = "value",
900906
forecast_horizon: int = 1,
901907
stride: int = 1,
902908
overlap_end: bool = False,
@@ -949,6 +955,7 @@ def _optimized_historical_forecasts(
949955
future_covariates=future_covariates,
950956
num_samples=num_samples,
951957
start=start,
958+
start_format=start_format,
952959
forecast_horizon=forecast_horizon,
953960
stride=stride,
954961
overlap_end=overlap_end,
@@ -963,6 +970,7 @@ def _optimized_historical_forecasts(
963970
future_covariates=future_covariates,
964971
num_samples=num_samples,
965972
start=start,
973+
start_format=start_format,
966974
forecast_horizon=forecast_horizon,
967975
stride=stride,
968976
overlap_end=overlap_end,

0 commit comments

Comments
 (0)