Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial package creation #44

Merged
merged 116 commits into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
75a3762
Fixed some window related tests but broken commit due to dt multiclas…
kaspersgit Mar 7, 2024
e60c907
All tests passing
Mar 8, 2024
648743a
Improved structure, fixed tests on windows, updated some files
kaspersgit Mar 12, 2024
83dc498
Fixed some requirements and versions for succesful tests
Mar 12, 2024
588e9ca
Splitted normal and dev requirements and put them in docs/ folder
kaspersgit Mar 13, 2024
de9ed1f
Slightly simplified setup, updated readmes and ran ruff
Mar 13, 2024
295bc15
Users inputted model name formatting and better auto config creater
Mar 13, 2024
7f91d61
Slight rewrite of install package command to do a python -m pip install
kaspersgit Mar 13, 2024
065b1a5
Removed double file listing
Mar 14, 2024
78494ae
Updated readme with output of the tool
Mar 14, 2024
7b03644
Updated readme
Mar 15, 2024
b5ef80a
Added test and ruff formatted
Mar 15, 2024
419dbdb
Updated Readme and some tiny edits
Mar 16, 2024
c616923
Merge branch 'main' into development
kaspersgit Mar 16, 2024
db6e88c
Merge branch 'main' into development
Mar 16, 2024
075a666
Tiny adjustments - non code
Mar 19, 2024
b39d760
Expanded testing and some minor adjustments
Mar 21, 2024
cb3ee3e
Added logo and some minor changes
Mar 21, 2024
13e30e1
Improved logger setup, removed as function parameter
Mar 24, 2024
86dd0de
Added pre commit ruff and pytest hook
Mar 24, 2024
3c47048
Updated ReadMe and moved some tests
Mar 25, 2024
9490a25
Improved Readme and increased default rows in automatic config maker
Mar 25, 2024
87ab112
Added demo in form of a gif to readme
Mar 25, 2024
14bdaf8
Forgot gif file
Mar 25, 2024
d809776
Merge branch 'main' into development
Mar 25, 2024
1852ae4
Removed file
Mar 25, 2024
1b02404
First attempt to add local explanations
kaspersgit Mar 27, 2024
a498186
Added Xi correlation
Apr 10, 2024
01743f1
Added best pracrices on some scripts
Apr 16, 2024
3067df4
Merge branch 'main' into development
May 14, 2024
d217275
Updated packages
May 14, 2024
09812c6
Added test and started w github actions
May 27, 2024
21a17dc
Adjusted github workflow file
May 27, 2024
ad676ec
Updated requirements
May 27, 2024
8aaf124
Updated github workflow
May 27, 2024
f43b405
spelling mistake
May 27, 2024
0a5cb82
Add venv in github actions
May 28, 2024
77477df
Updated github workflow2
May 28, 2024
3407826
Add specific os runnable
Jun 4, 2024
85b6257
try 1x
Jun 5, 2024
39bb1f1
try 2x
Jun 5, 2024
d9b414a
try 3x
Jun 5, 2024
8520f0e
try 4x
Jun 5, 2024
067bc76
try 4x
Jun 5, 2024
e41ad6e
try 5x
Jun 5, 2024
ea9b13d
try 5x
Jun 5, 2024
c48b6b9
try 6x
Jun 5, 2024
6d5f0d6
try 7x
Jun 5, 2024
808a92b
try 8x
Jun 5, 2024
4905398
try 9x
Jun 5, 2024
758c63b
github workflow fix v1
Jun 7, 2024
4a8b5a8
github workflow fix v1
Jun 7, 2024
382213f
github workflow fix v2
Jun 7, 2024
6a9fa14
github workflow fix v2
Jun 7, 2024
58453ea
github workflow fix v2
Jun 7, 2024
5ecea4a
github workflow fix v3
Jun 7, 2024
8297967
github workflow fix v3
Jun 7, 2024
ed997bb
github workflow fix v4
Jun 7, 2024
95320bf
github workflow fix v5
Jun 7, 2024
f760d10
github workflow fix v6
Jun 7, 2024
3b26840
github workflow fix v6
Jun 7, 2024
dd0c484
github workflow fix v7
Jun 7, 2024
ad2170b
github workflow fix v8
Jun 7, 2024
8305cdc
github workflow fix v9
Jun 7, 2024
e5af627
github workflow fix v9
Jun 7, 2024
7dea7be
github workflow fix v9
Jun 7, 2024
d70a2db
github workflow fix v9
Jun 7, 2024
4bbd92c
github workflow fix v9
Jun 7, 2024
a654b90
github workflow fix v9
Jun 7, 2024
093f0a6
github workflow fix v10
Jun 8, 2024
af0786b
github workflow fix v10
Jun 8, 2024
9a7d5ba
github workflow fix v10
Jun 8, 2024
db5bfa5
github workflow fix v11
Jun 8, 2024
8517d94
Wrong checksum error possible fix
Jun 13, 2024
7626284
Removed change might already have been fixed
Jun 13, 2024
33ed393
Slightly changed test which stucks
Jun 13, 2024
4e400b3
Attempt fix github workflow
Jun 14, 2024
f5fe66c
Attempt fix github workflow v2
Jun 14, 2024
8a1ddbb
Attempt fix github workflow v3
Jun 14, 2024
ca1bb80
Attempt fix github workflow v4
Jun 14, 2024
2072203
Attempt fix github workflow v5
Jun 14, 2024
2c2d9c4
Attempt fix github workflow v5
Jun 14, 2024
ff9661f
Attempt fix github workflow v6
Jun 14, 2024
666bcfb
Attempt fix github workflow v7
Jun 14, 2024
6e41ba8
Attempt fix github workflow v8
Jun 14, 2024
b32f2b0
fixed conflicts
Jun 14, 2024
ad00938
Updated requirements files
Jun 14, 2024
3eede9e
removed double package in requirements
Jun 14, 2024
3b8556d
Making it a true cli tool
Jun 15, 2024
e72ce5c
Added option for clean-data command
Jun 15, 2024
80e3dd4
Updated packages
Jun 15, 2024
866930b
Moved typer to normal requirements
Jun 15, 2024
343ac8a
Slight restructure and included demo files for init
Jun 16, 2024
c355002
Added adjusted github workflow for CI
Jun 16, 2024
552cb96
Added adjusted github workflow for CI v2
Jun 16, 2024
92fe5a7
Fixed CLI command tests
Jun 19, 2024
c10e252
Slightly changed install package command
Jun 20, 2024
6a0e7e2
Slight adjustement github workflow file
Jun 20, 2024
ba4be67
Slight adjustement github workflow file
Jun 20, 2024
2ceb3d9
Slight adjustement github workflow file
Jun 20, 2024
fe6ed63
Updated requirements files
Jun 20, 2024
4715bf2
Updated pyproject to not allow numpy 2 as it breaks interpretml
Jun 20, 2024
ed6efb1
Some precautionary np call updates to work with np2 in the future
Jun 20, 2024
4190782
Restructured again
Jun 23, 2024
52cb52b
Restructured again
Jun 23, 2024
df10dc6
Testing out different configs
Jun 24, 2024
cb62f90
Testing out different configs
Jun 24, 2024
73a36c2
Testing out different configs
Jun 24, 2024
faf8b7e
Testing out different configs
Jun 24, 2024
fa5a27c
Another attempt on github action window runner success
Jun 25, 2024
83c81d2
Another attempt on github action window runner success
Jun 25, 2024
03f9816
Ready for packaging
Jun 25, 2024
0173d59
ready for release
Jun 25, 2024
6c02d5b
Bumped version to v0.1.3
Jun 25, 2024
2a9ce3f
Bumped version to v0.1.3
Jun 25, 2024
3613d74
added simple build script and removed old github workflow
Jun 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 10 additions & 12 deletions .github/workflows/run_test.yml → .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Run Tests via Pytest on Linux, Unix and Windows
name: CI

on: [push]

Expand All @@ -19,29 +19,27 @@ jobs:
- name: Install dependencies ${{ matrix.os }}
run: |
python -m pip install --upgrade pip
python -m venv .ml2sql
if [ "$RUNNER_OS" == "Windows" ]; then
".ml2sql\Scripts\python" -m pip install --index-url https://pypi.org/simple -r "docs\requirements-dev.txt"
else
.ml2sql/bin/python -m pip install --index-url https://pypi.org/simple -r docs/requirements-dev.txt
fi
python -m pip install -e .
shell: bash
- name: Install development dependencies
run: |
python -m pip install ".[dev]"
shell: bash
- name: Lint with Ruff
run: |
if [ "$RUNNER_OS" == "Windows" ]; then
".ml2sql\Scripts\ruff" check --output-format=github .
"ruff" check --output-format=github .
else
.ml2sql/bin/ruff check --output-format=github .
ruff check --output-format=github .
fi
shell: bash
continue-on-error: true
- name: Test with pytest
run: |
if [ "$RUNNER_OS" == "Windows" ]; then
".ml2sql\Scripts\pytest" -v -k "not _script and not test_pre_process_kfold"
python -m "pytest" -k "not test_run and not test_check_model and not test_pre_process_kfold"
else
source .ml2sql/bin/activate
coverage run -m pytest -v
python -m "pytest"
fi
shell: bash

2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ repos:
hooks:
- id: pytest-check
name: pytest-check
entry: pytest
entry: python -m "pytest"
language: system
pass_filenames: false
always_run: true
37 changes: 37 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added
- Added Kmeans (unsupervised) as model to choose
- Auto test created SQL model vs pickled model

### Changed
- For changes in existing functionality.

### Deprecated
- For soon-to-be removed features.

### Removed
- For now removed features.

### Fixed
- For any bug fixes.

### Security
- In case of vulnerabilities.

## [0.1.2] - 2024-06-25

### Added
- Initial release of the package.
- Use as command line tool (commands: init, run, check-model and clean-data)
- Automatic ML model training
- Outputting several performance graphs
- Saves model in .sav format and in .sql format

1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include ml2sql/input/data/*
47 changes: 13 additions & 34 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

# Table of Contents

<img src="docs/media/ml2sql_logo.png" align="right"
<img src="https://github.com/kaspersgit/ml_2_sql/blob/main/docs/media/ml2sql_logo.png?raw=true" align="right"
alt="ML2SQL">

1. [What is it?](#what-is-it)
Expand All @@ -20,10 +20,10 @@
<br>

# What is it?
An automated machine learning tool which trains, graphs performance and saves the model in SQL. Using interpretable ML models (from [interpretml](https://github.com/interpretml/interpret/)) to train models which are explainable and interpretable, so called 'glassbox' models. With the outputted model in SQL format which can be used to put a model in 'production' in an SQL environment.
An automated machine learning cli tool which trains, graphs performance and saves the model in SQL. Using interpretable ML models (from [interpretml](https://github.com/interpretml/interpret/)) to train models which are explainable and interpretable, so called 'glassbox' models. With the outputted model in SQL format which can be used to put a model in 'production' in an SQL environment.
This tool can be used by anybody, but is aimed for people who want to do a quick analysis and/or deploy a model in an SQL system.

<center><img src="docs/media/ml2sql_demo.gif"
<center><img src="https://github.com/kaspersgit/ml_2_sql/blob/main/docs/media/ml2sql_demo.gif?raw=true"
alt="ML2SQL_demo" height=400 width=600></center>

## Philosophy:
Expand All @@ -46,30 +46,13 @@ This tool can be used by anybody, but is aimed for people who want to do a quick

# Getting started
<details>
<summary><strong>Installation</strong></summary>
<summary><strong>Set up</strong></summary>
<br>

1. Make sure you have python >= 3.8 and git installed
2. Clone Github repo to your local machine and cd into folder, run:
```
git clone [email protected]:kaspersgit/ml_2_sql.git
cd ml_2_sql
```
3. Create virtual environment and install packages, run:

Windows:
```
python -m venv .ml2sql
.ml2sql/Scripts/python -m pip install -r docs/requirements.txt
```

Mac/Linux:
```
python3 -m venv .ml2sql
.ml2sql/bin/python -m pip install -r docs/requirements.txt
```
4. Wait until all packages are installed (could take a few minutes)
5. You are ready to go (the virtual env does not need to be activated to use this tool)
1. Make sure you have python >= 3.8
2. `pip install ml2sql`
3. Run: `ml2sql init`
This will create the folders, `input/data/`, `input/configuration/ and `trained_models/`

<br>
</details>
Expand All @@ -78,9 +61,7 @@ This tool can be used by anybody, but is aimed for people who want to do a quick
<br>

1. In the terminal in the root of this folder run:
- `python3 run.py` (Mac/Linux)
- `python run.py` (Windows)
2. Follow the instructions on screen by selecting the example data and similarly named config file
`ml2sql run`, follow the instructions on screen and select the demo data and config
3. Check the output in the newly created folder

<br>
Expand All @@ -90,9 +71,7 @@ This tool can be used by anybody, but is aimed for people who want to do a quick
<br>

1. Save csv file containing target and all features in the `input/data/` folder (more info on [input data](#data))
2. In the terminal in the root of this folder run:
- `python3 run.py` (Mac/Linux)
- `python run.py` (Windows)
2. Run: `ml2sql run`
3. Select your CSV file
4. Select `Create a new config` and choose `Automatic` option (a config file will be made and can be edited later) (more info on [config json](#configuration-json))
5. Select newly created config
Expand All @@ -110,8 +89,7 @@ This tool can be used by anybody, but is aimed for people who want to do a quick
1. Make sure the new dataset has the same variables as the dataset the model was trained on (same features and target)
2. Save dataset in the `input/data/` folder (more info on [input data](#data))
3. In the terminal in the root of this folder run:
- `python3 check_model.py` (Mac/Linux)
- `python check_model.py` (Windows)
`ml2sql check-model`
4. Follow the instructions on screen
5. The output will be saved in the folder `trained_models/<selected_model>/tested_datasets/<selected_dataset>/`

Expand Down Expand Up @@ -256,12 +234,13 @@ Can be found in the created model's folder under `/model`

## Notes
- Limited to 3 models (EBM, linear/logistic regression, and Decision Tree).
- Data imbalance treatments (e.g., oversampling + model calibration) are not fully implemented.
- Data imbalance treatments (e.g., oversampling + model calibration) are not implemented.
- Only accepts CSV files.
- Interactions with more than 2 variables are not supported.

## TODO list
Check docs/TODO.md for an extensive list of planned features and improvements.
Feel free to open an issue in case a feature is missing or not working properly.

# Troubleshooting
If you encounter an unclear error message after following the instructions above, feel free to create an Issue on the GitHub repository.
19 changes: 19 additions & 0 deletions build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash

# Ensure we exit immediately if any command fails
set -e

# Clean up old distribution files
echo "Cleaning up old distribution files..."
rm -rf dist/*

# Create source distribution and wheel
echo "Building source distribution and wheel..."
python -m build

# Optional: Run checks
echo "Running twine check..."
twine check dist/*

echo "Build complete. Distribution files are in the 'dist/' directory."
echo "To upload to PyPI, run: twine upload dist/*"
92 changes: 0 additions & 92 deletions check_model.py

This file was deleted.

1 change: 1 addition & 0 deletions docs/TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Checks and config

- Other
- Allow for other data file types (apart from csv)
- Test generated SQL vs trained model and report on difference
- Switch decision tree from sklearn to interpret for coherence (wait on [issue 552](https://github.com/interpretml/interpret/issues/522))
- Add calibration (platt scaling/isotonic regression)
- Add changelog and versioning
Expand Down
15 changes: 12 additions & 3 deletions docs/devReadMe.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,19 @@
Mac/Linux:
```
python3 -m venv .ml2sql
.ml2sql/bin/python -m pip install --index-url https://pypi.org/simple -r docs/requirements-dev.txt
source .ml2sql/bin/activate
python -m pip install --index-url https://pypi.org/simple \
-r docs/requirements-dev.txt \
-e . # <- the app/pkg itself
```

Windows
```
python -m venv .ml2sql
.ml2sql/Scripts/python -m pip install --index-url https://pypi.org/simple -r docs/requirements-dev.txt
.ml2sql/Script/activate
python -m pip install --index-url https://pypi.org/simple \
-r docs/requirements-dev.txt \
-e . # <- the app/pkg itself
```

### Testing
Expand All @@ -20,4 +26,7 @@ python -m venv .ml2sql
With the virtual env activated
- Compile user requirements.txt file: `python -m piptools compile --index-url=https://pypi.org/simple -o docs/requirements.txt pyproject.toml`
- Compile dev requirements-dev.txt file: `python -m piptools compile --index-url=https://pypi.org/simple --extra dev -o docs/requirements-dev.txt -c docs/requirements.txt pyproject.toml`
(Making sure packages in both files have the same version, [stackoverflow source](https://stackoverflow.com/questions/76055688/generate-aligned-requirements-txt-and-dev-requirements-txt-with-pip-compile))
(Making sure packages in both files have the same version, [stackoverflow source](https://stackoverflow.com/questions/76055688/generate-aligned-requirements-txt-and-dev-requirements-txt-with-pip-compile))

### Building package
https://packaging.python.org/en/latest/tutorials/packaging-projects/
Loading