Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/Example notebook for the regression models #2039

Merged
merged 24 commits into from
Nov 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
209b50b
feat: example notebook for the regression models and new dataset (ene…
madtoinou Oct 27, 2023
679dce6
fix: tests, some datasets width were missing
madtoinou Oct 27, 2023
378cbe2
feat: udpated changelog
madtoinou Oct 27, 2023
2363a40
Merge branch 'master' into feat/regression_model_example
madtoinou Oct 30, 2023
a9f3226
fix: to keep the API uniform, Zurich energy consumption and weather w…
madtoinou Nov 2, 2023
7725235
fix: changed the way datasets are loaded, added an illustration for m…
madtoinou Nov 2, 2023
1dd8bfc
fix: tweaked notebook
madtoinou Nov 2, 2023
a7f324c
feat: grouped dataset and their width into a single variable to impro…
madtoinou Nov 2, 2023
853b46c
Merge branch 'master' into feat/regression_model_example
madtoinou Nov 2, 2023
2d79529
Merge branch 'master' into feat/regression_model_example
madtoinou Nov 6, 2023
816df73
Apply suggestions from code review
madtoinou Nov 6, 2023
e8e5fa4
fix: simplified API to load the EnergyConsumptionZurich dataset, upda…
madtoinou Nov 6, 2023
cdb2f67
Merge branch 'master' into feat/regression_model_example
madtoinou Nov 6, 2023
db3c659
fix: remove the obsolete dataset from the tests
madtoinou Nov 6, 2023
c3c96b6
Merge branch 'feat/regression_model_example' of https://github.com/un…
madtoinou Nov 6, 2023
58827d1
Merge branch 'master' into feat/regression_model_example
dennisbader Nov 6, 2023
9370238
Merge branch 'master' into feat/regression_model_example
dennisbader Nov 8, 2023
345c523
blabla
dennisbader Nov 8, 2023
eb374c3
update dataset
dennisbader Nov 8, 2023
e2b5478
update notebook p1
dennisbader Nov 8, 2023
ac49f01
update regression model notebook
dennisbader Nov 9, 2023
0f24f10
notebook last fixes
dennisbader Nov 9, 2023
a07cf3e
fix: typo
madtoinou Nov 10, 2023
2eb1bc2
add regression model example test to merge workflow
dennisbader Nov 11, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
example-name: [00-quickstart.ipynb, 01-multi-time-series-and-covariates.ipynb, 02-data-processing.ipynb, 03-FFT-examples.ipynb, 04-RNN-examples.ipynb, 05-TCN-examples.ipynb, 06-Transformer-examples.ipynb, 07-NBEATS-examples.ipynb, 08-DeepAR-examples.ipynb, 09-DeepTCN-examples.ipynb, 10-Kalman-filter-examples.ipynb, 11-GP-filter-examples.ipynb, 12-Dynamic-Time-Warping-example.ipynb, 13-TFT-examples.ipynb, 15-static-covariates.ipynb, 16-hierarchical-reconciliation.ipynb, 18-TiDE-examples.ipynb, 19-EnsembleModel-examples.ipynb]
example-name: [00-quickstart.ipynb, 01-multi-time-series-and-covariates.ipynb, 02-data-processing.ipynb, 03-FFT-examples.ipynb, 04-RNN-examples.ipynb, 05-TCN-examples.ipynb, 06-Transformer-examples.ipynb, 07-NBEATS-examples.ipynb, 08-DeepAR-examples.ipynb, 09-DeepTCN-examples.ipynb, 10-Kalman-filter-examples.ipynb, 11-GP-filter-examples.ipynb, 12-Dynamic-Time-Warping-example.ipynb, 13-TFT-examples.ipynb, 15-static-covariates.ipynb, 16-hierarchical-reconciliation.ipynb, 18-TiDE-examples.ipynb, 19-EnsembleModel-examples.ipynb, 20-RegressionModel-examples.ipynb]
steps:
- name: "1. Clone repository"
uses: actions/checkout@v2
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,13 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
- Added callback `darts.utils.callbacks.TFMProgressBar` to customize at which model stages to display the progress bar. [#2020](https://github.com/unit8co/darts/pull/2020) by [Dennis Bader](https://github.com/dennisbader).
- Improvements to documentation:
- Adapted the example notebooks to properly apply data transformers and avoid look-ahead bias. [#2020](https://github.com/unit8co/darts/pull/2020) by [Samriddhi Singh](https://github.com/SimTheGreat).
- New example notebook for the `RegressionModels` explaining features such as (component-specific) lags, `output_chunk_length` in relation with `multi_models`, multivariate support, and more. [#2039](https://github.com/unit8co/darts/pull/2039) by [Antoine Madrona](https://github.com/madtoinou).
- Improvements to Regression Models:
- `XGBModel` now leverages XGBoost's native Quantile Regression support that was released in version 2.0.0 for improved probabilistic forecasts. [#2051](https://github.com/unit8co/darts/pull/2051) by [Dennis Bader](https://github.com/dennisbader).
- Other improvements:
- Added support for time index time zone conversion with parameter `tz` before generating/computing holidays and datetime attributes. Support was added to all Time Axis Encoders (standalone encoders and forecasting models' `add_encoders`, time series generation utils functions `holidays_timeseries()` and `datetime_attribute_timeseries()`, and `TimeSeries` methods `add_datetime_attribute()` and `add_holidays()`. [#2054](https://github.com/unit8co/darts/pull/2054) by [Dennis Bader](https://github.com/dennisbader).
- Added optional keyword arguments dict `kwargs` to `ExponentialSmoothing` that will be passed to the constructor of the underlying `statsmodels.tsa.holtwinters.ExponentialSmoothing` model. [#2059](https://github.com/unit8co/darts/pull/2059) by [Antoine Madrona](https://github.com/madtoinou).
- Added new dataset `ElectricityConsumptionZurichDataset`: The dataset contains the electricity consumption of households in Zurich, Switzerland from 2015-2022 on different grid levels. We also added weather measurements for Zurich which can be used as covariates for modelling. [#2039](https://github.com/unit8co/darts/pull/2039) by [Antoine Madrona](https://github.com/madtoinou) and [Dennis Bader](https://github.com/dennisbader).

**Fixed**
- Fixed a bug when calling optimized `historical_forecasts()` for a `RegressionModel` trained with unequal component-specific lags. [#2040](https://github.com/unit8co/darts/pull/2040) by [Antoine Madrona](https://github.com/madtoinou).
Expand Down
111 changes: 110 additions & 1 deletion darts/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@
A few popular time series datasets
"""

import os
from pathlib import Path
from typing import List
from typing import List, Literal, Optional

import numpy as np
import pandas as pd
Expand Down Expand Up @@ -813,3 +814,111 @@ def _to_multi_series(self, series: pd.DataFrame) -> List[TimeSeries]:
Load the WeatherDataset dataset as a list of univariate timeseries, one for weather indicator.
"""
return [TimeSeries.from_series(series[label]) for label in series]


class ElectricityConsumptionZurichDataset(DatasetLoaderCSV):
"""
Electricity Consumption of households & SMEs (low voltage) and businesses & services (medium voltage) in the
city of Zurich [1]_, with values recorded every 15 minutes.

The electricity consumption is combined with weather measurements recorded by three different
stations in the city of Zurich with a hourly frequency [2]_. The missing time stamps are filled with NaN.
The original weather data is recorded every hour. Before adding the features to the electricity consumption,
the data is resampled to 15 minutes frequency, and missing values are interpolated.

To simplify the dataset, the measurements from the Zch_Schimmelstrasse and Zch_Rosengartenstrasse weather
stations are discarded to keep only the data recorded in the Zch_Stampfenbachstrasse station.

Both dataset sources are updated continuously, but this dataset only retrains values between 2015 and 2022.
The time index was converted from CET time zone to UTC.

Components Descriptions:

* Value_NE5 : Households & SMEs electricity consumption (low voltage, grid level 7) in kWh
* Value_NE7 : Business and services electricity consumption (medium voltage, grid level 5) in kWh
* Hr [%Hr] : Relative humidity
* RainDur [min] : Duration of precipitation (divided by 4 for conversion from hourly to quarter-hourly records)
* T [°C] : Temperature
* WD [°] : Wind direction
* WVv [m/s] : Wind vector speed
* p [hPa] : Air pressure
* WVs [m/s] : Wind scalar speed
* StrGlo [W/m2] : Global solar irradiation

Note: before 2018, the scalar speeds were calculated from the 30 minutes vector data.

References
----------
.. [1] https://data.stadt-zuerich.ch/dataset/ewz_stromabgabe_netzebenen_stadt_zuerich
.. [2] https://data.stadt-zuerich.ch/dataset/ugz_meteodaten_stundenmittelwerte
"""

def __init__(self):
def pre_process_dataset(dataset_path):
"""Restrict the time axis and add the weather data"""
df = pd.read_csv(dataset_path, index_col=0)
# convert time index
df.index = (
pd.DatetimeIndex(df.index, tz="CET").tz_convert("UTC").tz_localize(None)
)
# extract pre-determined period
df = df.loc[
(pd.Timestamp("2015-01-01") <= df.index)
& (df.index <= pd.Timestamp("2022-12-31"))
]
# download and preprocess the weather information
df_weather = self._download_weather_data()
# add weather data as additional features
df = pd.concat([df, df_weather], axis=1)
# interpolate weather data
df = df.interpolate()
# raining duration is given in minutes -> we divide by 4 from hourly to quarter-hourly records
df["RainDur [min]"] = df["RainDur [min]"] / 4

# round Electricity cols to 4 decimals, other columns to 2 decimals
cols_precise = ["Value_NE5", "Value_NE7"]
df = df.round(
decimals={col: (4 if col in cols_precise else 2) for col in df.columns}
)

# export the dataset
df.index.name = "Timestamp"
df.to_csv(self._get_path_dataset())

# hash value for dataset with weather data
super().__init__(
metadata=DatasetLoaderMetadata(
"zurich_electricity_consumption.csv",
uri=(
"https://data.stadt-zuerich.ch/dataset/"
"ewz_stromabgabe_netzebenen_stadt_zuerich/"
"download/ewz_stromabgabe_netzebenen_stadt_zuerich.csv"
),
hash="c2fea1a0974611ff1c276abcc1d34619",
header_time="Timestamp",
freq="15min",
pre_process_csv_fn=pre_process_dataset,
)
)

@staticmethod
def _download_weather_data():
"""Concatenate the yearly csv files into a single dataframe and reshape it"""
# download the csv from the url
base_url = "https://data.stadt-zuerich.ch/dataset/ugz_meteodaten_stundenmittelwerte/download/"
filenames = [f"ugz_ogd_meteo_h1_{year}.csv" for year in range(2015, 2023)]
df = pd.concat([pd.read_csv(base_url + fname) for fname in filenames])
# retain only one weather station
df = df.loc[df["Standort"] == "Zch_Stampfenbachstrasse"]
# pivot the df to get all measurements as columns
df["param_name"] = df["Parameter"] + " [" + df["Einheit"] + "]"
df = df.pivot(index="Datum", columns="param_name", values="Wert")
# convert time index to from CET to UTC and extract the required time range
df.index = (
pd.DatetimeIndex(df.index, tz="CET").tz_convert("UTC").tz_localize(None)
)
df = df.loc[
(pd.Timestamp("2015-01-01") <= df.index)
& (df.index <= pd.Timestamp("2022-12-31"))
]
return df
18 changes: 15 additions & 3 deletions darts/datasets/dataset_loaders.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,10 @@ class DatasetLoaderMetadata:
format_time: Optional[str] = None
# used to indicate the freq when we already know it
freq: Optional[str] = None
# a custom function to handling non-csv based datasets
# a custom function handling non-csv based datasets
pre_process_zipped_csv_fn: Optional[Callable] = None
# a custom function handling csv based datasets
pre_process_csv_fn: Optional[Callable] = None
# multivariate
multivariate: Optional[bool] = None

Expand All @@ -49,7 +51,9 @@ class DatasetLoader(ABC):

_DEFAULT_DIRECTORY = Path(os.path.join(Path.home(), Path(".darts/datasets/")))

def __init__(self, metadata: DatasetLoaderMetadata, root_path: Path = None):
def __init__(
self, metadata: DatasetLoaderMetadata, root_path: Optional[Path] = None
):
self._metadata: DatasetLoaderMetadata = metadata
if root_path is None:
self._root_path: Path = DatasetLoader._DEFAULT_DIRECTORY
Expand Down Expand Up @@ -131,7 +135,13 @@ def _download_dataset(self):
"Could not download the dataset. Reason:" + e.__repr__()
) from None

if self._metadata.pre_process_csv_fn is not None:
self._metadata.pre_process_csv_fn(self._get_path_dataset())

def _download_zip_dataset(self):
if self._metadata.pre_process_csv_fn:
logger.warning("Loading a ZIP file does not use the pre_process_csv_fn")

os.makedirs(self._root_path, exist_ok=True)
try:
request = requests.get(self._metadata.uri)
Expand Down Expand Up @@ -186,7 +196,9 @@ def _format_time_column(self, df):


class DatasetLoaderCSV(DatasetLoader):
def __init__(self, metadata: DatasetLoaderMetadata, root_path: Path = None):
def __init__(
self, metadata: DatasetLoaderMetadata, root_path: Optional[Path] = None
):
super().__init__(metadata, root_path)

def _load_from_disk(
Expand Down
62 changes: 31 additions & 31 deletions darts/tests/datasets/test_dataset_loaders.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
AirPassengersDataset,
AusBeerDataset,
AustralianTourismDataset,
ElectricityConsumptionZurichDataset,
ElectricityDataset,
EnergyDataset,
ETTh1Dataset,
Expand Down Expand Up @@ -40,37 +41,36 @@
DatasetLoadingException,
)

datasets = [
AirPassengersDataset,
AusBeerDataset,
AustralianTourismDataset,
EnergyDataset,
HeartRateDataset,
IceCreamHeaterDataset,
MonthlyMilkDataset,
SunspotsDataset,
TaylorDataset,
TemperatureDataset,
USGasolineDataset,
WineDataset,
WoolyDataset,
GasRateCO2Dataset,
MonthlyMilkIncompleteDataset,
ETTh1Dataset,
ETTh2Dataset,
ETTm1Dataset,
ETTm2Dataset,
ElectricityDataset,
UberTLCDataset,
ILINetDataset,
ExchangeRateDataset,
TrafficDataset,
WeatherDataset,
]

_DEFAULT_PATH_TEST = _DEFAULT_PATH + "/tests"

width_datasets = [1, 1, 96, 28, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 7, 7, 7, 7, 370, 262]
datasets_with_width = [
(AirPassengersDataset, 1),
(AusBeerDataset, 1),
(AustralianTourismDataset, 96),
(EnergyDataset, 28),
(HeartRateDataset, 1),
(IceCreamHeaterDataset, 2),
(MonthlyMilkDataset, 1),
(SunspotsDataset, 1),
(TaylorDataset, 1),
(TemperatureDataset, 1),
(USGasolineDataset, 1),
(WineDataset, 1),
(WoolyDataset, 1),
(GasRateCO2Dataset, 2),
(MonthlyMilkIncompleteDataset, 1),
(ETTh1Dataset, 7),
(ETTh2Dataset, 7),
(ETTm1Dataset, 7),
(ETTm2Dataset, 7),
(ElectricityDataset, 370),
(UberTLCDataset, 262),
(ILINetDataset, 11),
(ExchangeRateDataset, 8),
(TrafficDataset, 862),
(WeatherDataset, 21),
(ElectricityConsumptionZurichDataset, 10),
]

wrong_hash_dataset = DatasetLoaderCSV(
metadata=DatasetLoaderMetadata(
Expand Down Expand Up @@ -135,9 +135,9 @@ def tmp_dir_dataset():

class TestDatasetLoader:
@pytest.mark.slow
@pytest.mark.parametrize("dataset_config", zip(width_datasets, datasets))
@pytest.mark.parametrize("dataset_config", datasets_with_width)
def test_ok_dataset(self, dataset_config, tmp_dir_dataset):
width, dataset_cls = dataset_config
dataset_cls, width = dataset_config
dataset = dataset_cls()
assert dataset._DEFAULT_DIRECTORY == tmp_dir_dataset
ts: TimeSeries = dataset.load()
Expand Down
10 changes: 10 additions & 0 deletions docs/source/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,16 @@ with Darts using the Optuna library for hyperparameter optimization.

examples/17-hyperparameter-optimization.ipynb

Regression Models
=================

Regression models example notebook:

.. toctree::
:maxdepth: 1

examples/20-RegressionModel-examples.ipynb


Fast Fourier Transform
======================
Expand Down
1,100 changes: 1,100 additions & 0 deletions examples/20-RegressionModel-examples.ipynb

Large diffs are not rendered by default.

Binary file added examples/static/images/multi_model_ocl2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/static/images/single_model_ocl2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/static/images/single_model_ocl3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.