From 4450ce69f0e83cd18b5824c81a6c28230f0b54e9 Mon Sep 17 00:00:00 2001
From: Yassine Alouini <yassinealouini@outlook.com>
Date: Mon, 27 Feb 2023 10:42:00 +0100
Subject: [PATCH] Make the SQLQueryDataSet compatible with mssql. (#101)

* [kedro-docker] Layers size optimization (#92)

* [kedro-docker] Layers size optimization

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Adjust test requirements

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Skip coverage check on tests dir (some do not execute on Windows)

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Update .coveragerc with the setup

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Fix bandit so it does not scan kedro-datasets

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Fixed existence test

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Check why dir is not created

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Kedro starters are fixed now

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Increased no-output-timeout for long spark image build

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>

* Spark image optimized

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

* Linting

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

* Switch to slim image always

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

* Trigger build

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

* Use textwrap.dedent for nicer indentation

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

* Revert "Use textwrap.dedent for nicer indentation"

This reverts commit 3a1e3f855a29c6a1b118db3e844e5f9b67ade363.

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

* Revert "Revert "Use textwrap.dedent for nicer indentation""

This reverts commit d322d353b25d414cdfdef8ee12185e5a1d9baa2c.

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

* Make tests read more lines (to skip all deprecation warnings)

Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>
Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Release Kedro-Docker 0.3.1 (#94)

* Add release notes for kedro-docker 0.3.1

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update version in kedro_docker module

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Bump version and update release notes (#96)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Make the SQLQueryDataSet compatible with mssql.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Add one test + update RELEASE.md.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Add missing pyodbc for tests.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Mock connection as well.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Add more dates parsing for mssql backend (thanks to fgaudindelrieu@idmog.com)

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Fix an error in docstring of MetricsDataSet (#98)

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Bump relax pyarrow version to work the same way as Pandas (#100)

* Bump relax pyarrow version to work the same way as Pandas

We only use PyArrow for `pandas.ParquetDataSet` as such I suggest we keep our versions pinned to the same range as [Pandas does](https://github.com/pandas-dev/pandas/blob/96fc51f5ec678394373e2c779ccff37ddb966e75/pyproject.toml#L100) for the same reason.

As such I suggest we remove the upper bound as we have users requesting later versions in [support channels](https://kedro-org.slack.com/archives/C03RKP2LW64/p1674040509133529)

* Updated release notes

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Add missing type in catalog example.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Add one more unit tests for adapt_mssql.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* [FIX] Add missing mocker from date test.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* [TEST] Add a wrong input test.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Add pyodbc dependency.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* [FIX] Remove dict() in tests.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Change check to check on plugin name (#103)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Set coverage in pyproject.toml (#105)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Move coverage settings to pyproject.toml (#106)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Replace kedro.pipeline with modular_pipeline.pipeline factory (#99)

* Add non-spark related test changes
Replace kedro.pipeline.Pipeline with
kedro.pipeline.modular_pipeline.pipeline factory.
This is for symmetry with changes made to the main kedro library.

Signed-off-by: Adam Farley <adamfrly@gmail.com>

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Fix outdated links in Kedro Datasets (#111)

* fix links

* fix dill links

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Fix docs formatting and phrasing for some datasets (#107)

* Fix docs formatting and phrasing for some datasets

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Manually fix files not resolved with patch command

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Apply fix from #98

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Release `kedro-datasets` `version 1.0.2` (#112)

* bump version and update release notes

* fix pylint errors

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Bump pytest to 7.2 (#113)

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Prefix Docker plugin name with "Kedro-" in usage message (#57)

* Prefix Docker plugin name with "Kedro-" in usage message

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Keep Kedro-Docker plugin docstring from appearing in `kedro -h` (#56)

* Keep Kedro-Docker plugin docstring from appearing in `kedro -h`

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* [kedro-datasets ] Add `Polars.CSVDataSet` (#95)

Signed-off-by: wmoreiraa <walber3@gmail.com>

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* Remove deprecated `test_requires` from `setup.py` in Kedro-Docker (#54)

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>

* [FIX] Fix ds to data_set.

Signed-off-by: Yassine Alouini <yalouini@idmog.com>

---------

Signed-off-by: Mariusz Strzelecki <mariusz.strzelecki@getindata.com>
Signed-off-by: Mariusz Strzelecki <szczeles@gmail.com>
Signed-off-by: Yassine Alouini <yalouini@idmog.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-authored-by: Mariusz Strzelecki <szczeles@gmail.com>
Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Co-authored-by: OKA Naoya <pn11@users.noreply.github.com>
Co-authored-by: Joel <35801847+datajoely@users.noreply.github.com>
Co-authored-by: adamfrly <45516720+adamfrly@users.noreply.github.com>
Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com>
Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-authored-by: Walber Moreira <58264877+wmoreiraa@users.noreply.github.com>
---
 kedro-datasets/RELEASE.md                     |  2 +-
 .../kedro_datasets/pandas/sql_dataset.py      | 69 +++++++++++++++++++
 kedro-datasets/setup.py                       |  2 +-
 kedro-datasets/test_requirements.txt          |  1 +
 .../tests/pandas/test_sql_dataset.py          | 54 +++++++++++++++
 5 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/kedro-datasets/RELEASE.md b/kedro-datasets/RELEASE.md
index 3b51df818..412fe9f9c 100644
--- a/kedro-datasets/RELEASE.md
+++ b/kedro-datasets/RELEASE.md
@@ -11,7 +11,7 @@
 | `polars.CSVDataSet` | A `CSVDataSet` backed by [polars](https://www.pola.rs/), a lighting fast dataframe package built entirely using Rust. | `kedro_datasets.polars` |
 
 ## Bug fixes and other changes
-
+* Add `mssql` backend to the `SQLQueryDataSet` DataSet using `pyodbc` library.
 
 # Release 1.0.2:
 
diff --git a/kedro-datasets/kedro_datasets/pandas/sql_dataset.py b/kedro-datasets/kedro_datasets/pandas/sql_dataset.py
index 1400e4981..dd5d636a1 100644
--- a/kedro-datasets/kedro_datasets/pandas/sql_dataset.py
+++ b/kedro-datasets/kedro_datasets/pandas/sql_dataset.py
@@ -1,6 +1,7 @@
 """``SQLDataSet`` to load and save data to a SQL backend."""
 
 import copy
+import datetime as dt
 import re
 from pathlib import PurePosixPath
 from typing import Any, Dict, NoReturn, Optional
@@ -22,6 +23,7 @@
     "psycopg2": "psycopg2",
     "mysqldb": "mysqlclient",
     "cx_Oracle": "cx_Oracle",
+    "mssql": "pyodbc",
 }
 
 DRIVER_ERROR_MESSAGE = """
@@ -321,7 +323,49 @@ class SQLQueryDataSet(AbstractDataSet[None, pd.DataFrame]):
         >>>                            credentials=credentials)
         >>>
         >>> sql_data = data_set.load()
+        >>>
+    Example of usage for mssql:
+    ::
+
+
+        >>> credentials = {"server": "localhost", "port": "1433",
+        >>>                "database": "TestDB", "user": "SA",
+        >>>                "password": "StrongPassword"}
+        >>> def _make_mssql_connection_str(
+        >>>    server: str, port: str, database: str, user: str, password: str
+        >>> ) -> str:
+        >>>    import pyodbc  # noqa
+        >>>    from sqlalchemy.engine import URL  # noqa
+        >>>
+        >>>    driver = pyodbc.drivers()[-1]
+        >>>    connection_str = (f"DRIVER={driver};SERVER={server},{port};DATABASE={database};"
+        >>>                      f"ENCRYPT=yes;UID={user};PWD={password};"
+        >>>                       "TrustServerCertificate=yes;")
+        >>>    return URL.create("mssql+pyodbc", query={"odbc_connect": connection_str})
+        >>> connection_str = _make_mssql_connection_str(**credentials)
+        >>> data_set = SQLQueryDataSet(credentials={"con": connection_str},
+        >>>                            sql="SELECT TOP 5 * FROM TestTable;")
+        >>> df = data_set.load()
+
+    In addition, here is an example of a catalog with dates parsing:
+    ::
+
 
+        >>> mssql_dataset:
+        >>>    type: kedro_datasets.pandas.SQLQueryDataSet
+        >>>    credentials: mssql_credentials
+        >>>    sql: >
+        >>>       SELECT *
+        >>>       FROM  DateTable
+        >>>       WHERE date >= ? AND date <= ?
+        >>>       ORDER BY date
+        >>>    load_args:
+        >>>       params:
+        >>>        - ${begin}
+        >>>        - ${end}
+        >>>       index_col: date
+        >>>       parse_dates:
+        >>>         date: "%Y-%m-%d %H:%M:%S.%f0 %z"
     """
 
     # using Any because of Sphinx but it should be
@@ -413,6 +457,8 @@ def __init__(  # pylint: disable=too-many-arguments
         self._connection_str = credentials["con"]
         self._execution_options = execution_options or {}
         self.create_connection(self._connection_str)
+        if "mssql" in self._connection_str:
+            self.adapt_mssql_date_params()
 
     @classmethod
     def create_connection(cls, connection_str: str) -> None:
@@ -456,3 +502,26 @@ def _load(self) -> pd.DataFrame:
 
     def _save(self, data: None) -> NoReturn:
         raise DataSetError("'save' is not supported on SQLQueryDataSet")
+
+    # For mssql only
+    def adapt_mssql_date_params(self) -> None:
+        """We need to change the format of datetime parameters.
+        MSSQL expects datetime in the exact format %y-%m-%dT%H:%M:%S.
+        Here, we also accept plain dates.
+        `pyodbc` does not accept named parameters, they must be provided as a list."""
+        params = self._load_args.get("params", [])
+        if not isinstance(params, list):
+            raise DataSetError(
+                "Unrecognized `params` format. It can be only a `list`, "
+                f"got {type(params)!r}"
+            )
+        new_load_args = []
+        for value in params:
+            try:
+                as_date = dt.date.fromisoformat(value)
+                new_val = dt.datetime.combine(as_date, dt.time.min)
+                new_load_args.append(new_val.strftime("%Y-%m-%dT%H:%M:%S"))
+            except (TypeError, ValueError):
+                new_load_args.append(value)
+        if new_load_args:
+            self._load_args["params"] = new_load_args
diff --git a/kedro-datasets/setup.py b/kedro-datasets/setup.py
index 9effe1fca..e054e17e8 100644
--- a/kedro-datasets/setup.py
+++ b/kedro-datasets/setup.py
@@ -58,7 +58,7 @@ def _collect_requirements(requires):
     "pandas.JSONDataSet": [PANDAS],
     "pandas.ParquetDataSet": [PANDAS, "pyarrow>=6.0"],
     "pandas.SQLTableDataSet": [PANDAS, "SQLAlchemy~=1.2"],
-    "pandas.SQLQueryDataSet": [PANDAS, "SQLAlchemy~=1.2"],
+    "pandas.SQLQueryDataSet": [PANDAS, "SQLAlchemy~=1.2", "pyodbc~=4.0"],
     "pandas.XMLDataSet": [PANDAS, "lxml~=4.6"],
     "pandas.GenericDataSet": [PANDAS],
 }
diff --git a/kedro-datasets/test_requirements.txt b/kedro-datasets/test_requirements.txt
index 8dec3619b..2b742b751 100644
--- a/kedro-datasets/test_requirements.txt
+++ b/kedro-datasets/test_requirements.txt
@@ -38,6 +38,7 @@ pre-commit>=2.9.2, <3.0  # The hook `mypy` requires pre-commit version 2.9.2.
 psutil==5.8.0
 pyarrow>=1.0, <7.0
 pylint>=2.5.2, <3.0
+pyodbc~=4.0.35
 pyproj~=3.0
 pyspark>=2.2, <4.0
 pytest-cov~=3.0
diff --git a/kedro-datasets/tests/pandas/test_sql_dataset.py b/kedro-datasets/tests/pandas/test_sql_dataset.py
index a1c6839d6..aa9fe8d17 100644
--- a/kedro-datasets/tests/pandas/test_sql_dataset.py
+++ b/kedro-datasets/tests/pandas/test_sql_dataset.py
@@ -11,6 +11,7 @@
 
 TABLE_NAME = "table_a"
 CONNECTION = "sqlite:///kedro.db"
+MSSQL_CONNECTION = "mssql+pyodbc://?odbc_connect=DRIVER%3DODBC+Driver+for+SQL"
 SQL_QUERY = "SELECT * FROM table_a"
 EXECUTION_OPTIONS = {"stream_results": True}
 FAKE_CONN_STR = "some_sql://scott:tiger@localhost/foo"
@@ -417,3 +418,56 @@ def test_create_connection_only_once(self, mocker):
         assert mock_engine.call_count == 2
         assert fourth.engines == first.engines
         assert len(first.engines) == 2
+
+    def test_adapt_mssql_date_params_called(self, mocker):
+        """Test that the adapt_mssql_date_params
+        function is called when mssql backend is used.
+        """
+        mock_adapt_mssql_date_params = mocker.patch(
+            "kedro_datasets.pandas.sql_dataset.SQLQueryDataSet.adapt_mssql_date_params"
+        )
+        mock_engine = mocker.patch("kedro_datasets.pandas.sql_dataset.create_engine")
+        ds = SQLQueryDataSet(sql=SQL_QUERY, credentials={"con": MSSQL_CONNECTION})
+        mock_engine.assert_called_once_with(MSSQL_CONNECTION)
+        assert mock_adapt_mssql_date_params.call_count == 1
+        assert len(ds.engines) == 1
+
+    def test_adapt_mssql_date_params(self, mocker):
+        """Test that the adapt_mssql_date_params
+        function transforms the params as expected, i.e.
+        making datetime date into the format %Y-%m-%dT%H:%M:%S
+        and ignoring the other values.
+        """
+        mocker.patch("kedro_datasets.pandas.sql_dataset.create_engine")
+        load_args = {
+            "params": ["2023-01-01", "2023-01-01T20:26", "2023", "test", 1.0, 100]
+        }
+        ds = SQLQueryDataSet(
+            sql=SQL_QUERY, credentials={"con": MSSQL_CONNECTION}, load_args=load_args
+        )
+        assert ds._load_args["params"] == [
+            "2023-01-01T00:00:00",
+            "2023-01-01T20:26",
+            "2023",
+            "test",
+            1.0,
+            100,
+        ]
+
+    def test_adapt_mssql_date_params_wrong_input(self, mocker):
+        """Test that the adapt_mssql_date_params
+        function fails with the correct error message
+        when given a wrong input
+        """
+        mocker.patch("kedro_datasets.pandas.sql_dataset.create_engine")
+        load_args = {"params": {"value": 1000}}
+        pattern = (
+            "Unrecognized `params` format. It can be only a `list`, "
+            "got <class 'dict'>"
+        )
+        with pytest.raises(DataSetError, match=pattern):
+            SQLQueryDataSet(
+                sql=SQL_QUERY,
+                credentials={"con": MSSQL_CONNECTION},
+                load_args=load_args,
+            )