Skip to content

Commit 5b8d92a

Browse files
kaspersgitKasper de Harder
and
Kasper de Harder
authored
Development (#39)
* Fixed some window related tests but broken commit due to dt multiclass test * All tests passing * Improved structure, fixed tests on windows, updated some files * Fixed some requirements and versions for succesful tests * Splitted normal and dev requirements and put them in docs/ folder * Slightly simplified setup, updated readmes and ran ruff * Users inputted model name formatting and better auto config creater * Slight rewrite of install package command to do a python -m pip install * Removed double file listing * Updated readme with output of the tool * Updated readme * Added test and ruff formatted * Updated Readme and some tiny edits * Tiny adjustments - non code * Expanded testing and some minor adjustments * Added logo and some minor changes * Improved logger setup, removed as function parameter * Added pre commit ruff and pytest hook * Updated ReadMe and moved some tests * Improved Readme and increased default rows in automatic config maker * Added demo in form of a gif to readme * Forgot gif file * Removed file * First attempt to add local explanations * Added Xi correlation * Added best pracrices on some scripts * Updated packages * Added test and started w github actions * Adjusted github workflow file * Updated requirements * Updated github workflow * spelling mistake * Add venv in github actions * Updated github workflow2 * Add specific os runnable * try 1x * try 2x * try 3x * try 4x * try 4x * try 5x * try 5x * try 6x * try 7x * try 8x * try 9x * github workflow fix v1 * github workflow fix v1 * github workflow fix v2 * github workflow fix v2 * github workflow fix v2 * github workflow fix v3 * github workflow fix v3 * github workflow fix v4 * github workflow fix v5 * github workflow fix v6 * github workflow fix v6 * github workflow fix v7 * github workflow fix v8 * github workflow fix v9 * github workflow fix v9 * github workflow fix v9 * github workflow fix v9 * github workflow fix v9 * github workflow fix v9 * github workflow fix v10 * github workflow fix v10 * github workflow fix v10 * github workflow fix v11 * Wrong checksum error possible fix * Removed change might already have been fixed * Slightly changed test which stucks * Attempt fix github workflow * Attempt fix github workflow v2 * Attempt fix github workflow v3 * Attempt fix github workflow v4 * Attempt fix github workflow v5 * Attempt fix github workflow v5 * Attempt fix github workflow v6 * Attempt fix github workflow v7 * Attempt fix github workflow v8 * Updated requirements files * removed double package in requirements --------- Co-authored-by: Kasper de Harder <[email protected]>
1 parent b21507c commit 5b8d92a

12 files changed

+165
-31
lines changed

.github/workflows/run_test.yml

+47
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
name: Run Tests via Pytest on Linux, Unix and Windows
2+
3+
on: [push]
4+
5+
jobs:
6+
build:
7+
runs-on: ${{ matrix.os }}
8+
strategy:
9+
matrix:
10+
os: [ubuntu-latest, windows-latest, macos-latest]
11+
python-version: ["3.10"]
12+
13+
steps:
14+
- uses: actions/checkout@v4
15+
- name: Set up Python ${{ matrix.python-version }} on ${{ matrix.os }}
16+
uses: actions/setup-python@v5
17+
with:
18+
python-version: ${{ matrix.python-version }}
19+
- name: Install dependencies ${{ matrix.os }}
20+
run: |
21+
python -m pip install --upgrade pip
22+
python -m venv .ml2sql
23+
if [ "$RUNNER_OS" == "Windows" ]; then
24+
".ml2sql\Scripts\python" -m pip install --index-url https://pypi.org/simple -r "docs\requirements-dev.txt"
25+
else
26+
.ml2sql/bin/python -m pip install --index-url https://pypi.org/simple -r docs/requirements-dev.txt
27+
fi
28+
shell: bash
29+
- name: Lint with Ruff
30+
run: |
31+
if [ "$RUNNER_OS" == "Windows" ]; then
32+
".ml2sql\Scripts\ruff" check --output-format=github .
33+
else
34+
.ml2sql/bin/ruff check --output-format=github .
35+
fi
36+
shell: bash
37+
continue-on-error: true
38+
- name: Test with pytest
39+
run: |
40+
if [ "$RUNNER_OS" == "Windows" ]; then
41+
".ml2sql\Scripts\pytest" -v -k "not _script and not test_pre_process_kfold"
42+
else
43+
source .ml2sql/bin/activate
44+
coverage run -m pytest -v
45+
fi
46+
shell: bash
47+

docs/devReadMe.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,6 @@ python -m venv .ml2sql
1818

1919
### Package management (pinning)
2020
With the virtual env activated
21-
- Compile user requirements.txt file: `python -m piptools compile -o docs/requirements.txt pyproject.toml`
22-
- Compile dev requirements-dev.txt file: `python -m piptools compile --extra dev -o docs/requirements-dev.txt -c docs/requirements.txt pyproject.toml`
21+
- Compile user requirements.txt file: `python -m piptools compile --index-url=https://pypi.org/simple -o docs/requirements.txt pyproject.toml`
22+
- Compile dev requirements-dev.txt file: `python -m piptools compile --index-url=https://pypi.org/simple --extra dev -o docs/requirements-dev.txt -c docs/requirements.txt pyproject.toml`
2323
(Making sure packages in both files have the same version, [stackoverflow source](https://stackoverflow.com/questions/76055688/generate-aligned-requirements-txt-and-dev-requirements-txt-with-pip-compile))

docs/requirements-dev.txt

+3-4
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,8 @@
22
# This file is autogenerated by pip-compile with Python 3.11
33
# by the following command:
44
#
5-
# pip-compile --constraint=docs/requirements.txt --extra=dev --output-file=docs/requirements-dev.txt pyproject.toml
5+
# pip-compile --constraint=docs/requirements.txt --extra=dev --index-url=https://pypi.org/simple --output-file=docs/requirements-dev.txt pyproject.toml
66
#
7-
--index-url https://artifactory-edge.ess.midasplayer.com/artifactory/api/pypi/pypi-all/pypi
8-
--extra-index-url https://pypi.org/simple
9-
--trusted-host artifactory.ess.midasplayer.com
107

118
appnope==0.1.4
129
# via
@@ -49,6 +46,8 @@ contourpy==1.2.1
4946
# via
5047
# -c docs/requirements.txt
5148
# matplotlib
49+
coverage==7.5.2
50+
# via ml_2_sql (pyproject.toml)
5251
cycler==0.12.1
5352
# via
5453
# -c docs/requirements.txt

docs/requirements.txt

+5-3
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,6 @@
44
#
55
# pip-compile --output-file=docs/requirements.txt pyproject.toml
66
#
7-
--index-url https://artifactory-edge.ess.midasplayer.com/artifactory/api/pypi/pypi-all/pypi
8-
--extra-index-url https://pypi.org/simple
9-
--trusted-host artifactory.ess.midasplayer.com
107

118
appnope==0.1.4
129
# via ipykernel
@@ -148,6 +145,9 @@ pandas==2.2.2
148145
# shap
149146
parso==0.8.4
150147
# via jedi
148+
pexpect==4.9.0
149+
# via ipython
150+
pillow==10.3.0
151151
pexpect==4.9.0
152152
# via ipython
153153
pillow==10.3.0
@@ -250,6 +250,8 @@ werkzeug==3.0.3
250250
# flask
251251
zipp==3.18.1
252252
# via importlib-metadata
253+
zipp==3.18.1
254+
# via importlib-metadata
253255
zope-event==5.0
254256
# via gevent
255257
zope-interface==6.3

pyproject.toml

+2-1
Original file line numberDiff line numberDiff line change
@@ -39,5 +39,6 @@ dev = [
3939
"pytest",
4040
"pip-tools",
4141
"ruff",
42-
"pre-commit"
42+
"pre-commit",
43+
"coverage"
4344
]

scripts/utils/feature_selection/correlations.py

-3
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,6 @@
66
import plotly.figure_factory as ff
77
import plotly.graph_objects as go
88
import scipy.stats as ss
9-
import logging
10-
11-
logger = logging.getLogger(__name__)
129

1310
logger = logging.getLogger(__name__)
1411

tests/integration/test_check_version.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -69,10 +69,10 @@ def test_interpret_model_version():
6969

7070
print(f"Looking for files ending with '_v{version_suffix}' in '{folder_path}'")
7171

72-
files_with_version = []
73-
for filename in os.listdir(folder_path):
74-
filename = filename.split(".")[0]
75-
files_with_version.append(filename.endswith(f"_v{version_suffix}"))
76-
72+
# Check if the version in filenames matches the version_suffix
73+
files_with_version = [
74+
os.path.splitext(filename)[0].split("_v")[-1] == str(version_suffix)
75+
for filename in os.listdir(folder_path)
76+
]
7777
# Checking if all models in testing direcotry are of installed version
7878
assert all(files_with_version)

tests/integration/test_cli_commands.py

+42-13
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010

1111
@pytest.fixture(scope="module")
1212
def setup_file_structure():
13+
print("Start file structure setup function")
1314
OUTPUT_PATH = "trained_models/test_tool"
1415
DATA_PATH = "input/data/example_binary_titanic.csv"
1516

@@ -33,23 +34,35 @@ def setup_file_structure():
3334
except FileExistsError:
3435
sys.exit("Error: Model directory already exists")
3536

37+
print("finish file structure setup function")
3638
yield OUTPUT_PATH, DATA_PATH
3739

3840
# Remove the folder and its contents
3941
shutil.rmtree(OUTPUT_PATH)
4042

4143

4244
def test_main_script(setup_file_structure):
45+
print("Start main calling function")
4346
OUTPUT_PATH, DATA_PATH = setup_file_structure
4447

4548
# Check platform, windows is different from linux/mac
4649
if sys.platform == "win32":
4750
executable = ".ml2sql\\Scripts\\python.exe"
51+
command = [
52+
executable,
53+
"scripts\\main.py",
54+
"--name",
55+
OUTPUT_PATH,
56+
"--data_path",
57+
DATA_PATH,
58+
"--configuration",
59+
"input\\configuration\\example_binary_titanic.json",
60+
"--model",
61+
"ebm",
62+
]
4863
else:
4964
executable = ".ml2sql/bin/python"
50-
51-
result = subprocess.run(
52-
[
65+
command = [
5366
executable,
5467
"scripts/main.py",
5568
"--name",
@@ -60,7 +73,10 @@ def test_main_script(setup_file_structure):
6073
"input/configuration/example_binary_titanic.json",
6174
"--model",
6275
"ebm",
63-
],
76+
]
77+
78+
result = subprocess.run(
79+
command,
6480
# stdout=subprocess.PIPE,
6581
capture_output=True,
6682
text=True,
@@ -75,32 +91,45 @@ def test_main_script(setup_file_structure):
7591
assert "Target column has 2 unique values" in result.stderr
7692
assert "This problem will be treated as a classification problem" in result.stderr
7793

94+
print("Finish calling main function")
95+
7896

7997
def test_modeltester_script(setup_file_structure):
98+
print("Start calling modeltester function")
8099
OUTPUT_PATH, DATA_PATH = setup_file_structure
81-
MODEL_PATH = f"{OUTPUT_PATH}/model/ebm_classification.sav"
82-
DATASET_NAME = DATA_PATH.split("/")[-1].split(".")[0]
83-
DESTINATION_PATH = f"{OUTPUT_PATH}/tested_datasets/{DATASET_NAME}"
100+
DATASET_NAME = os.path.split(DATA_PATH)[-1].split(".")[0]
84101

85102
# Check platform, windows is different from linux/mac
86103
if sys.platform == "win32":
87104
executable = ".ml2sql\\Scripts\\python.exe"
105+
command = [
106+
executable,
107+
"scripts\\modeltester.py",
108+
"--model_path",
109+
f"{OUTPUT_PATH}\\model\\ebm_classification.sav",
110+
"--data_path",
111+
DATA_PATH,
112+
"--destination_path",
113+
f"{OUTPUT_PATH}\\tested_datasets\\{DATASET_NAME}",
114+
]
88115
else:
89116
executable = ".ml2sql/bin/python"
90-
91-
result = subprocess.run(
92-
[
117+
command = [
93118
executable,
94119
"scripts/modeltester.py",
95120
"--model_path",
96-
MODEL_PATH,
121+
f"{OUTPUT_PATH}/model/ebm_classification.sav",
97122
"--data_path",
98123
DATA_PATH,
99124
"--destination_path",
100-
DESTINATION_PATH,
101-
],
125+
f"{OUTPUT_PATH}/tested_datasets/{DATASET_NAME}",
126+
]
127+
128+
result = subprocess.run(
129+
command,
102130
stdout=subprocess.PIPE,
103131
text=True,
104132
check=False,
105133
)
106134
assert result.returncode == 0
135+
print("Finish calling modeltester function")

tests/integration/test_dt_sql.py

+1
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ def post_params(request):
7878

7979

8080
def test_model_processing(load_model_data, post_params):
81+
print("Start dt SQL creation and run test")
8182
# unpack data and model
8283
data, model, model_type = load_model_data
8384

tests/integration/test_ebm_sql.py

+1
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ def post_params(request):
7878

7979

8080
def test_model_processing(load_model_data, post_params):
81+
print("Start EBM SQL creation and run test")
8182
# unpack data and model
8283
data, model, model_type = load_model_data
8384

tests/integration/test_lr_sql.py

+1
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ def post_params(request):
7979

8080

8181
def test_model_processing(load_model_data, post_params):
82+
print("Start lr SQL creation and run test")
8283
# unpack data and model
8384
data, model, model_type = load_model_data
8485

+56
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
import pandas as pd
2+
import numpy as np
3+
import sys
4+
import pytest
5+
6+
sys.path.append("scripts")
7+
8+
from utils.feature_selection.correlations import (
9+
cramers_corrected_stat,
10+
xicor,
11+
create_correlation_matrix,
12+
)
13+
14+
15+
def test_cramers_corrected_stat():
16+
# Mock chi2_contingency function from scipy.stats
17+
confusion_matrix = pd.DataFrame([[10, 5, 5], [2, 10, 10]])
18+
result = cramers_corrected_stat(confusion_matrix)
19+
20+
assert np.isclose(result, 0.4, atol=1e-1) # Test correlation with tolerance
21+
22+
23+
def test_xicor():
24+
# Sample data arrays
25+
X = np.arange(3, 100, 2)
26+
Y = X + 10 * np.sin(X)
27+
28+
result = xicor(X, Y)
29+
assert np.isclose(result, 0.7, atol=1e-1) # Test correlation with tolerance
30+
31+
32+
def test_create_correlation_matrix_pearson():
33+
# Mock corr function to return a sample correlation matrix
34+
data = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
35+
corr_matrix = create_correlation_matrix(data, "pearson")
36+
37+
assert isinstance(corr_matrix, pd.DataFrame)
38+
assert np.all((corr_matrix >= -1) & (corr_matrix <= 1))
39+
assert corr_matrix.shape == (2, 2) # Check dimensions
40+
41+
42+
def test_create_correlation_matrix_cramerv():
43+
# Mock crosstab function to return a sample contingency table
44+
data = pd.DataFrame({"col1": ["A", "A", "B", "B"], "col2": ["C", "D", "C", "D"]})
45+
corr_matrix = create_correlation_matrix(data, "cramerv")
46+
47+
assert isinstance(corr_matrix, pd.DataFrame)
48+
assert np.all((corr_matrix >= -1) & (corr_matrix <= 1))
49+
assert corr_matrix.shape == (2, 2) # Check dimensions
50+
51+
52+
def test_create_correlation_matrix_invalid_type():
53+
data = pd.DataFrame({"col1": [1, 2, 3], "col2": ["A", "B", "C"]})
54+
55+
with pytest.raises(ValueError):
56+
create_correlation_matrix(data, "invalid_type")

0 commit comments

Comments
 (0)