Development (#39)

kaspersgit · Kasper de Harder · web-flow · commit 5b8d92adbb5e · 2024-06-15T00:14:21.000+02:00
* Fixed some window related tests but broken commit due to dt multiclass test

* All tests passing

* Improved structure, fixed tests on windows, updated some files

* Fixed some requirements and versions for succesful tests

* Splitted normal and dev requirements and put them in docs/ folder

* Slightly simplified setup, updated readmes and ran ruff

* Users inputted model name formatting and better auto config creater

* Slight rewrite of install package command to do a python -m pip install

* Removed double file listing

* Updated readme with output of the tool

* Updated readme

* Added test and ruff formatted

* Updated Readme and some tiny edits

* Tiny adjustments - non code

* Expanded testing and some minor adjustments

* Added logo and some minor changes

* Improved logger setup, removed as function parameter

* Added pre commit ruff and pytest hook

* Updated ReadMe and moved some tests

* Improved Readme and increased default rows in automatic config maker

* Added demo in form of a gif to readme

* Forgot gif file

* Removed file

* First attempt to add local explanations

* Added Xi correlation

* Added best pracrices on some scripts

* Updated packages

* Added test and started w github actions

* Adjusted github workflow file

* Updated requirements

* Updated github workflow

* spelling mistake

* Add venv in github actions

* Updated github workflow2

* Add specific os runnable

* try 1x

* try 2x

* try 3x

* try 4x

* try 4x

* try 5x

* try 5x

* try 6x

* try 7x

* try 8x

* try 9x

* github workflow fix v1

* github workflow fix v1

* github workflow fix v2

* github workflow fix v2

* github workflow fix v2

* github workflow fix v3

* github workflow fix v3

* github workflow fix v4

* github workflow fix v5

* github workflow fix v6

* github workflow fix v6

* github workflow fix v7

* github workflow fix v8

* github workflow fix v9

* github workflow fix v9

* github workflow fix v9

* github workflow fix v9

* github workflow fix v9

* github workflow fix v9

* github workflow fix v10

* github workflow fix v10

* github workflow fix v10

* github workflow fix v11

* Wrong checksum error possible fix

* Removed change might already have been fixed

* Slightly changed test which stucks

* Attempt fix github workflow

* Attempt fix github workflow v2

* Attempt fix github workflow v3

* Attempt fix github workflow v4

* Attempt fix github workflow v5

* Attempt fix github workflow v5

* Attempt fix github workflow v6

* Attempt fix github workflow v7

* Attempt fix github workflow v8

* Updated requirements files

* removed double package in requirements

---------

Co-authored-by: Kasper de Harder &lt;kasper.de@king.com&gt;
diff --git a/.github/workflows/run_test.yml b/.github/workflows/run_test.yml
@@ -0,0 +1,47 @@
+name: Run Tests via Pytest on Linux, Unix and Windows
+
+on: [push]  
+
+jobs:  
+  build:  
+    runs-on: ${{ matrix.os }}
+    strategy:  
+      matrix:  
+        os: [ubuntu-latest, windows-latest, macos-latest]
+        python-version: ["3.10"]  
+
+    steps:  
+      - uses: actions/checkout@v4  
+      - name: Set up Python ${{ matrix.python-version }} on ${{ matrix.os }}
+        uses: actions/setup-python@v5
+        with:  
+          python-version: ${{ matrix.python-version }}  
+      - name:  Install dependencies ${{ matrix.os }}
+        run: |
+              python -m pip install --upgrade pip  
+              python -m venv .ml2sql
+              if [ "$RUNNER_OS" == "Windows" ]; then
+                    ".ml2sql\Scripts\python" -m pip install --index-url https://pypi.org/simple -r "docs\requirements-dev.txt"
+              else
+                    .ml2sql/bin/python -m pip install --index-url https://pypi.org/simple -r docs/requirements-dev.txt  
+              fi
+        shell: bash
+      - name: Lint with Ruff  
+        run: |
+              if [ "$RUNNER_OS" == "Windows" ]; then
+                    ".ml2sql\Scripts\ruff" check --output-format=github .
+              else
+                    .ml2sql/bin/ruff check --output-format=github .
+              fi  
+        shell: bash 
+        continue-on-error: true  
+      - name: Test with pytest
+        run: |  
+              if [ "$RUNNER_OS" == "Windows" ]; then
+                    ".ml2sql\Scripts\pytest" -v -k "not _script and not test_pre_process_kfold"
+              else
+                    source .ml2sql/bin/activate
+                    coverage run -m pytest -v
+              fi  
+        shell: bash 
+          
diff --git a/docs/devReadMe.md b/docs/devReadMe.md
@@ -18,6 +18,6 @@ python -m venv .ml2sql
 
 ### Package management (pinning)
 With the virtual env activated
-- Compile user requirements.txt file: `python -m piptools compile -o docs/requirements.txt pyproject.toml`
-- Compile dev requirements-dev.txt file: `python -m piptools compile --extra dev -o docs/requirements-dev.txt -c docs/requirements.txt pyproject.toml`
+- Compile user requirements.txt file: `python -m piptools compile --index-url=https://pypi.org/simple -o docs/requirements.txt pyproject.toml`
+- Compile dev requirements-dev.txt file: `python -m piptools compile --index-url=https://pypi.org/simple --extra dev -o docs/requirements-dev.txt -c docs/requirements.txt pyproject.toml`
   (Making sure packages in both files have the same version, [stackoverflow source](https://stackoverflow.com/questions/76055688/generate-aligned-requirements-txt-and-dev-requirements-txt-with-pip-compile))
diff --git a/docs/requirements-dev.txt b/docs/requirements-dev.txt
@@ -2,11 +2,8 @@
 # This file is autogenerated by pip-compile with Python 3.11
 # by the following command:
 #
-#    pip-compile --constraint=docs/requirements.txt --extra=dev --output-file=docs/requirements-dev.txt pyproject.toml
+#    pip-compile --constraint=docs/requirements.txt --extra=dev --index-url=https://pypi.org/simple --output-file=docs/requirements-dev.txt pyproject.toml
 #
---index-url https://artifactory-edge.ess.midasplayer.com/artifactory/api/pypi/pypi-all/pypi
---extra-index-url https://pypi.org/simple
---trusted-host artifactory.ess.midasplayer.com
 
 appnope==0.1.4
     # via
@@ -49,6 +46,8 @@ contourpy==1.2.1
     # via
     #   -c docs/requirements.txt
     #   matplotlib
+coverage==7.5.2
+    # via ml_2_sql (pyproject.toml)
 cycler==0.12.1
     # via
     #   -c docs/requirements.txt
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -4,9 +4,6 @@
 #
 #    pip-compile --output-file=docs/requirements.txt pyproject.toml
 #
---index-url https://artifactory-edge.ess.midasplayer.com/artifactory/api/pypi/pypi-all/pypi
---extra-index-url https://pypi.org/simple
---trusted-host artifactory.ess.midasplayer.com
 
 appnope==0.1.4
     # via ipykernel
@@ -148,6 +145,9 @@ pandas==2.2.2
     #   shap
 parso==0.8.4
     # via jedi
+pexpect==4.9.0
+    # via ipython
+pillow==10.3.0
 pexpect==4.9.0
     # via ipython
 pillow==10.3.0
@@ -250,6 +250,8 @@ werkzeug==3.0.3
     #   flask
 zipp==3.18.1
     # via importlib-metadata
+zipp==3.18.1
+    # via importlib-metadata
 zope-event==5.0
     # via gevent
 zope-interface==6.3
diff --git a/pyproject.toml b/pyproject.toml
@@ -39,5 +39,6 @@ dev = [
     "pytest",
     "pip-tools",
     "ruff",
-    "pre-commit"
+    "pre-commit",
+    "coverage"
 ]
diff --git a/scripts/utils/feature_selection/correlations.py b/scripts/utils/feature_selection/correlations.py
@@ -6,9 +6,6 @@
 import plotly.figure_factory as ff
 import plotly.graph_objects as go
 import scipy.stats as ss
-import logging
-
-logger = logging.getLogger(__name__)
 
 logger = logging.getLogger(__name__)
 
diff --git a/tests/integration/test_check_version.py b/tests/integration/test_check_version.py
@@ -69,10 +69,10 @@ def test_interpret_model_version():
 
     print(f"Looking for files ending with '_v{version_suffix}' in '{folder_path}'")
 
-    files_with_version = []
-    for filename in os.listdir(folder_path):
-        filename = filename.split(".")[0]
-        files_with_version.append(filename.endswith(f"_v{version_suffix}"))
-
+    # Check if the version in filenames matches the version_suffix
+    files_with_version = [
+        os.path.splitext(filename)[0].split("_v")[-1] == str(version_suffix)
+        for filename in os.listdir(folder_path)
+    ]
     # Checking if all models in testing direcotry are of installed version
     assert all(files_with_version)
diff --git a/tests/integration/test_cli_commands.py b/tests/integration/test_cli_commands.py
@@ -10,6 +10,7 @@
 
 @pytest.fixture(scope="module")
 def setup_file_structure():
+    print("Start file structure setup function")
     OUTPUT_PATH = "trained_models/test_tool"
     DATA_PATH = "input/data/example_binary_titanic.csv"
 
@@ -33,23 +34,35 @@ def setup_file_structure():
     except FileExistsError:
         sys.exit("Error: Model directory already exists")
 
+    print("finish file structure setup function")
     yield OUTPUT_PATH, DATA_PATH
 
     # Remove the folder and its contents
     shutil.rmtree(OUTPUT_PATH)
 
 
 def test_main_script(setup_file_structure):
+    print("Start main calling function")
     OUTPUT_PATH, DATA_PATH = setup_file_structure
 
     # Check platform, windows is different from linux/mac
     if sys.platform == "win32":
         executable = ".ml2sql\\Scripts\\python.exe"
+        command = [
+            executable,
+            "scripts\\main.py",
+            "--name",
+            OUTPUT_PATH,
+            "--data_path",
+            DATA_PATH,
+            "--configuration",
+            "input\\configuration\\example_binary_titanic.json",
+            "--model",
+            "ebm",
+        ]
     else:
         executable = ".ml2sql/bin/python"
-
-    result = subprocess.run(
-        [
+        command = [
             executable,
             "scripts/main.py",
             "--name",
@@ -60,7 +73,10 @@ def test_main_script(setup_file_structure):
             "input/configuration/example_binary_titanic.json",
             "--model",
             "ebm",
-        ],
+        ]
+
+    result = subprocess.run(
+        command,
         # stdout=subprocess.PIPE,
         capture_output=True,
         text=True,
@@ -75,32 +91,45 @@ def test_main_script(setup_file_structure):
     assert "Target column has 2 unique values" in result.stderr
     assert "This problem will be treated as a classification problem" in result.stderr
 
+    print("Finish calling main function")
+
 
 def test_modeltester_script(setup_file_structure):
+    print("Start calling modeltester function")
     OUTPUT_PATH, DATA_PATH = setup_file_structure
-    MODEL_PATH = f"{OUTPUT_PATH}/model/ebm_classification.sav"
-    DATASET_NAME = DATA_PATH.split("/")[-1].split(".")[0]
-    DESTINATION_PATH = f"{OUTPUT_PATH}/tested_datasets/{DATASET_NAME}"
+    DATASET_NAME = os.path.split(DATA_PATH)[-1].split(".")[0]
 
     # Check platform, windows is different from linux/mac
     if sys.platform == "win32":
         executable = ".ml2sql\\Scripts\\python.exe"
+        command = [
+            executable,
+            "scripts\\modeltester.py",
+            "--model_path",
+            f"{OUTPUT_PATH}\\model\\ebm_classification.sav",
+            "--data_path",
+            DATA_PATH,
+            "--destination_path",
+            f"{OUTPUT_PATH}\\tested_datasets\\{DATASET_NAME}",
+        ]
     else:
         executable = ".ml2sql/bin/python"
-
-    result = subprocess.run(
-        [
+        command = [
             executable,
             "scripts/modeltester.py",
             "--model_path",
-            MODEL_PATH,
+            f"{OUTPUT_PATH}/model/ebm_classification.sav",
             "--data_path",
             DATA_PATH,
             "--destination_path",
-            DESTINATION_PATH,
-        ],
+            f"{OUTPUT_PATH}/tested_datasets/{DATASET_NAME}",
+        ]
+
+    result = subprocess.run(
+        command,
         stdout=subprocess.PIPE,
         text=True,
         check=False,
     )
     assert result.returncode == 0
+    print("Finish calling modeltester function")
diff --git a/tests/integration/test_dt_sql.py b/tests/integration/test_dt_sql.py
@@ -78,6 +78,7 @@ def post_params(request):
 
 
 def test_model_processing(load_model_data, post_params):
+    print("Start dt SQL creation and run test")
     # unpack data and model
     data, model, model_type = load_model_data
 
diff --git a/tests/integration/test_ebm_sql.py b/tests/integration/test_ebm_sql.py
@@ -78,6 +78,7 @@ def post_params(request):
 
 
 def test_model_processing(load_model_data, post_params):
+    print("Start EBM SQL creation and run test")
     # unpack data and model
     data, model, model_type = load_model_data
 
diff --git a/tests/integration/test_lr_sql.py b/tests/integration/test_lr_sql.py
@@ -79,6 +79,7 @@ def post_params(request):
 
 
 def test_model_processing(load_model_data, post_params):
+    print("Start lr SQL creation and run test")
     # unpack data and model
     data, model, model_type = load_model_data
 
diff --git a/tests/unit/test_correlation_functions.py b/tests/unit/test_correlation_functions.py
@@ -0,0 +1,56 @@
+import pandas as pd
+import numpy as np
+import sys
+import pytest
+
+sys.path.append("scripts")
+
+from utils.feature_selection.correlations import (
+    cramers_corrected_stat,
+    xicor,
+    create_correlation_matrix,
+)
+
+
+def test_cramers_corrected_stat():
+    # Mock chi2_contingency function from scipy.stats
+    confusion_matrix = pd.DataFrame([[10, 5, 5], [2, 10, 10]])
+    result = cramers_corrected_stat(confusion_matrix)
+
+    assert np.isclose(result, 0.4, atol=1e-1)  # Test correlation with tolerance
+
+
+def test_xicor():
+    # Sample data arrays
+    X = np.arange(3, 100, 2)
+    Y = X + 10 * np.sin(X)
+
+    result = xicor(X, Y)
+    assert np.isclose(result, 0.7, atol=1e-1)  # Test correlation with tolerance
+
+
+def test_create_correlation_matrix_pearson():
+    # Mock corr function to return a sample correlation matrix
+    data = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
+    corr_matrix = create_correlation_matrix(data, "pearson")
+
+    assert isinstance(corr_matrix, pd.DataFrame)
+    assert np.all((corr_matrix >= -1) & (corr_matrix <= 1))
+    assert corr_matrix.shape == (2, 2)  # Check dimensions
+
+
+def test_create_correlation_matrix_cramerv():
+    # Mock crosstab function to return a sample contingency table
+    data = pd.DataFrame({"col1": ["A", "A", "B", "B"], "col2": ["C", "D", "C", "D"]})
+    corr_matrix = create_correlation_matrix(data, "cramerv")
+
+    assert isinstance(corr_matrix, pd.DataFrame)
+    assert np.all((corr_matrix >= -1) & (corr_matrix <= 1))
+    assert corr_matrix.shape == (2, 2)  # Check dimensions
+
+
+def test_create_correlation_matrix_invalid_type():
+    data = pd.DataFrame({"col1": [1, 2, 3], "col2": ["A", "B", "C"]})
+
+    with pytest.raises(ValueError):
+        create_correlation_matrix(data, "invalid_type")

Original file line number	Diff line number	Diff line change
`@@ -39,5 +39,6 @@ dev = [`
`39`	`39`	`"pytest",`
`40`	`40`	`"pip-tools",`
`41`	`41`	`"ruff",`
`42`		`- "pre-commit"`
	`42`	`+ "pre-commit",`
	`43`	`+ "coverage"`
`43`	`44`	`]`