Skip to content

Commit 32c211f

Browse files
kaspersgitKasper de Harder
and
Kasper de Harder
authored
Initial package creation (#44)
* Fixed some window related tests but broken commit due to dt multiclass test * All tests passing * Improved structure, fixed tests on windows, updated some files * Fixed some requirements and versions for succesful tests * Splitted normal and dev requirements and put them in docs/ folder * Slightly simplified setup, updated readmes and ran ruff * Users inputted model name formatting and better auto config creater * Slight rewrite of install package command to do a python -m pip install * Removed double file listing * Updated readme with output of the tool * Updated readme * Added test and ruff formatted * Updated Readme and some tiny edits * Tiny adjustments - non code * Expanded testing and some minor adjustments * Added logo and some minor changes * Improved logger setup, removed as function parameter * Added pre commit ruff and pytest hook * Updated ReadMe and moved some tests * Improved Readme and increased default rows in automatic config maker * Added demo in form of a gif to readme * Forgot gif file * Removed file * First attempt to add local explanations * Added Xi correlation * Added best pracrices on some scripts * Updated packages * Added test and started w github actions * Adjusted github workflow file * Updated requirements * Updated github workflow * spelling mistake * Add venv in github actions * Updated github workflow2 * Add specific os runnable * try 1x * try 2x * try 3x * try 4x * try 4x * try 5x * try 5x * try 6x * try 7x * try 8x * try 9x * github workflow fix v1 * github workflow fix v1 * github workflow fix v2 * github workflow fix v2 * github workflow fix v2 * github workflow fix v3 * github workflow fix v3 * github workflow fix v4 * github workflow fix v5 * github workflow fix v6 * github workflow fix v6 * github workflow fix v7 * github workflow fix v8 * github workflow fix v9 * github workflow fix v9 * github workflow fix v9 * github workflow fix v9 * github workflow fix v9 * github workflow fix v9 * github workflow fix v10 * github workflow fix v10 * github workflow fix v10 * github workflow fix v11 * Wrong checksum error possible fix * Removed change might already have been fixed * Slightly changed test which stucks * Attempt fix github workflow * Attempt fix github workflow v2 * Attempt fix github workflow v3 * Attempt fix github workflow v4 * Attempt fix github workflow v5 * Attempt fix github workflow v5 * Attempt fix github workflow v6 * Attempt fix github workflow v7 * Attempt fix github workflow v8 * Updated requirements files * removed double package in requirements * Making it a true cli tool * Added option for clean-data command * Updated packages * Moved typer to normal requirements * Slight restructure and included demo files for init * Added adjusted github workflow for CI * Added adjusted github workflow for CI v2 * Fixed CLI command tests * Slightly changed install package command * Slight adjustement github workflow file * Slight adjustement github workflow file * Slight adjustement github workflow file * Updated requirements files * Updated pyproject to not allow numpy 2 as it breaks interpretml * Some precautionary np call updates to work with np2 in the future * Restructured again * Restructured again * Testing out different configs * Testing out different configs * Testing out different configs * Testing out different configs * Another attempt on github action window runner success * Another attempt on github action window runner success * Ready for packaging * ready for release * Bumped version to v0.1.3 * added simple build script and removed old github workflow --------- Co-authored-by: Kasper de Harder <[email protected]>
1 parent 5b8d92a commit 32c211f

File tree

88 files changed

+5555
-2953
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+5555
-2953
lines changed

.github/workflows/run_test.yml .github/workflows/ci.yml

+10-12
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: Run Tests via Pytest on Linux, Unix and Windows
1+
name: CI
22

33
on: [push]
44

@@ -19,29 +19,27 @@ jobs:
1919
- name: Install dependencies ${{ matrix.os }}
2020
run: |
2121
python -m pip install --upgrade pip
22-
python -m venv .ml2sql
23-
if [ "$RUNNER_OS" == "Windows" ]; then
24-
".ml2sql\Scripts\python" -m pip install --index-url https://pypi.org/simple -r "docs\requirements-dev.txt"
25-
else
26-
.ml2sql/bin/python -m pip install --index-url https://pypi.org/simple -r docs/requirements-dev.txt
27-
fi
22+
python -m pip install -e .
23+
shell: bash
24+
- name: Install development dependencies
25+
run: |
26+
python -m pip install ".[dev]"
2827
shell: bash
2928
- name: Lint with Ruff
3029
run: |
3130
if [ "$RUNNER_OS" == "Windows" ]; then
32-
".ml2sql\Scripts\ruff" check --output-format=github .
31+
"ruff" check --output-format=github .
3332
else
34-
.ml2sql/bin/ruff check --output-format=github .
33+
ruff check --output-format=github .
3534
fi
3635
shell: bash
3736
continue-on-error: true
3837
- name: Test with pytest
3938
run: |
4039
if [ "$RUNNER_OS" == "Windows" ]; then
41-
".ml2sql\Scripts\pytest" -v -k "not _script and not test_pre_process_kfold"
40+
python -m "pytest" -k "not test_run and not test_check_model and not test_pre_process_kfold"
4241
else
43-
source .ml2sql/bin/activate
44-
coverage run -m pytest -v
42+
python -m "pytest"
4543
fi
4644
shell: bash
4745

.pre-commit-config.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ repos:
1111
hooks:
1212
- id: pytest-check
1313
name: pytest-check
14-
entry: pytest
14+
entry: python -m "pytest"
1515
language: system
1616
pass_filenames: false
1717
always_run: true

CHANGELOG.md

+37
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [Unreleased]
9+
10+
### Added
11+
- Added Kmeans (unsupervised) as model to choose
12+
- Auto test created SQL model vs pickled model
13+
14+
### Changed
15+
- For changes in existing functionality.
16+
17+
### Deprecated
18+
- For soon-to-be removed features.
19+
20+
### Removed
21+
- For now removed features.
22+
23+
### Fixed
24+
- For any bug fixes.
25+
26+
### Security
27+
- In case of vulnerabilities.
28+
29+
## [0.1.2] - 2024-06-25
30+
31+
### Added
32+
- Initial release of the package.
33+
- Use as command line tool (commands: init, run, check-model and clean-data)
34+
- Automatic ML model training
35+
- Outputting several performance graphs
36+
- Saves model in .sav format and in .sql format
37+

MANIFEST.in

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
include ml2sql/input/data/*

Readme.md

+13-34
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
# Table of Contents
99

10-
<img src="docs/media/ml2sql_logo.png" align="right"
10+
<img src="https://github.com/kaspersgit/ml_2_sql/blob/main/docs/media/ml2sql_logo.png?raw=true" align="right"
1111
alt="ML2SQL">
1212

1313
1. [What is it?](#what-is-it)
@@ -20,10 +20,10 @@
2020
<br>
2121

2222
# What is it?
23-
An automated machine learning tool which trains, graphs performance and saves the model in SQL. Using interpretable ML models (from [interpretml](https://github.com/interpretml/interpret/)) to train models which are explainable and interpretable, so called 'glassbox' models. With the outputted model in SQL format which can be used to put a model in 'production' in an SQL environment.
23+
An automated machine learning cli tool which trains, graphs performance and saves the model in SQL. Using interpretable ML models (from [interpretml](https://github.com/interpretml/interpret/)) to train models which are explainable and interpretable, so called 'glassbox' models. With the outputted model in SQL format which can be used to put a model in 'production' in an SQL environment.
2424
This tool can be used by anybody, but is aimed for people who want to do a quick analysis and/or deploy a model in an SQL system.
2525

26-
<center><img src="docs/media/ml2sql_demo.gif"
26+
<center><img src="https://github.com/kaspersgit/ml_2_sql/blob/main/docs/media/ml2sql_demo.gif?raw=true"
2727
alt="ML2SQL_demo" height=400 width=600></center>
2828

2929
## Philosophy:
@@ -46,30 +46,13 @@ This tool can be used by anybody, but is aimed for people who want to do a quick
4646

4747
# Getting started
4848
<details>
49-
<summary><strong>Installation</strong></summary>
49+
<summary><strong>Set up</strong></summary>
5050
<br>
5151

52-
1. Make sure you have python >= 3.8 and git installed
53-
2. Clone Github repo to your local machine and cd into folder, run:
54-
```
55-
git clone [email protected]:kaspersgit/ml_2_sql.git
56-
cd ml_2_sql
57-
```
58-
3. Create virtual environment and install packages, run:
59-
60-
Windows:
61-
```
62-
python -m venv .ml2sql
63-
.ml2sql/Scripts/python -m pip install -r docs/requirements.txt
64-
```
65-
66-
Mac/Linux:
67-
```
68-
python3 -m venv .ml2sql
69-
.ml2sql/bin/python -m pip install -r docs/requirements.txt
70-
```
71-
4. Wait until all packages are installed (could take a few minutes)
72-
5. You are ready to go (the virtual env does not need to be activated to use this tool)
52+
1. Make sure you have python >= 3.8
53+
2. `pip install ml2sql`
54+
3. Run: `ml2sql init`
55+
This will create the folders, `input/data/`, `input/configuration/ and `trained_models/`
7356

7457
<br>
7558
</details>
@@ -78,9 +61,7 @@ This tool can be used by anybody, but is aimed for people who want to do a quick
7861
<br>
7962

8063
1. In the terminal in the root of this folder run:
81-
- `python3 run.py` (Mac/Linux)
82-
- `python run.py` (Windows)
83-
2. Follow the instructions on screen by selecting the example data and similarly named config file
64+
`ml2sql run`, follow the instructions on screen and select the demo data and config
8465
3. Check the output in the newly created folder
8566

8667
<br>
@@ -90,9 +71,7 @@ This tool can be used by anybody, but is aimed for people who want to do a quick
9071
<br>
9172

9273
1. Save csv file containing target and all features in the `input/data/` folder (more info on [input data](#data))
93-
2. In the terminal in the root of this folder run:
94-
- `python3 run.py` (Mac/Linux)
95-
- `python run.py` (Windows)
74+
2. Run: `ml2sql run`
9675
3. Select your CSV file
9776
4. Select `Create a new config` and choose `Automatic` option (a config file will be made and can be edited later) (more info on [config json](#configuration-json))
9877
5. Select newly created config
@@ -110,8 +89,7 @@ This tool can be used by anybody, but is aimed for people who want to do a quick
11089
1. Make sure the new dataset has the same variables as the dataset the model was trained on (same features and target)
11190
2. Save dataset in the `input/data/` folder (more info on [input data](#data))
11291
3. In the terminal in the root of this folder run:
113-
- `python3 check_model.py` (Mac/Linux)
114-
- `python check_model.py` (Windows)
92+
`ml2sql check-model`
11593
4. Follow the instructions on screen
11694
5. The output will be saved in the folder `trained_models/<selected_model>/tested_datasets/<selected_dataset>/`
11795

@@ -256,12 +234,13 @@ Can be found in the created model's folder under `/model`
256234

257235
## Notes
258236
- Limited to 3 models (EBM, linear/logistic regression, and Decision Tree).
259-
- Data imbalance treatments (e.g., oversampling + model calibration) are not fully implemented.
237+
- Data imbalance treatments (e.g., oversampling + model calibration) are not implemented.
260238
- Only accepts CSV files.
261239
- Interactions with more than 2 variables are not supported.
262240

263241
## TODO list
264242
Check docs/TODO.md for an extensive list of planned features and improvements.
243+
Feel free to open an issue in case a feature is missing or not working properly.
265244

266245
# Troubleshooting
267246
If you encounter an unclear error message after following the instructions above, feel free to create an Issue on the GitHub repository.

build.sh

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#!/bin/bash
2+
3+
# Ensure we exit immediately if any command fails
4+
set -e
5+
6+
# Clean up old distribution files
7+
echo "Cleaning up old distribution files..."
8+
rm -rf dist/*
9+
10+
# Create source distribution and wheel
11+
echo "Building source distribution and wheel..."
12+
python -m build
13+
14+
# Optional: Run checks
15+
echo "Running twine check..."
16+
twine check dist/*
17+
18+
echo "Build complete. Distribution files are in the 'dist/' directory."
19+
echo "To upload to PyPI, run: twine upload dist/*"

check_model.py

-92
This file was deleted.

docs/TODO.md

+1
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Checks and config
1919

2020
- Other
2121
- Allow for other data file types (apart from csv)
22+
- Test generated SQL vs trained model and report on difference
2223
- Switch decision tree from sklearn to interpret for coherence (wait on [issue 552](https://github.com/interpretml/interpret/issues/522))
2324
- Add calibration (platt scaling/isotonic regression)
2425
- Add changelog and versioning

docs/devReadMe.md

+12-3
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,19 @@
33
Mac/Linux:
44
```
55
python3 -m venv .ml2sql
6-
.ml2sql/bin/python -m pip install --index-url https://pypi.org/simple -r docs/requirements-dev.txt
6+
source .ml2sql/bin/activate
7+
python -m pip install --index-url https://pypi.org/simple \
8+
-r docs/requirements-dev.txt \
9+
-e . # <- the app/pkg itself
710
```
811

912
Windows
1013
```
1114
python -m venv .ml2sql
12-
.ml2sql/Scripts/python -m pip install --index-url https://pypi.org/simple -r docs/requirements-dev.txt
15+
.ml2sql/Script/activate
16+
python -m pip install --index-url https://pypi.org/simple \
17+
-r docs/requirements-dev.txt \
18+
-e . # <- the app/pkg itself
1319
```
1420

1521
### Testing
@@ -20,4 +26,7 @@ python -m venv .ml2sql
2026
With the virtual env activated
2127
- Compile user requirements.txt file: `python -m piptools compile --index-url=https://pypi.org/simple -o docs/requirements.txt pyproject.toml`
2228
- Compile dev requirements-dev.txt file: `python -m piptools compile --index-url=https://pypi.org/simple --extra dev -o docs/requirements-dev.txt -c docs/requirements.txt pyproject.toml`
23-
(Making sure packages in both files have the same version, [stackoverflow source](https://stackoverflow.com/questions/76055688/generate-aligned-requirements-txt-and-dev-requirements-txt-with-pip-compile))
29+
(Making sure packages in both files have the same version, [stackoverflow source](https://stackoverflow.com/questions/76055688/generate-aligned-requirements-txt-and-dev-requirements-txt-with-pip-compile))
30+
31+
### Building package
32+
https://packaging.python.org/en/latest/tutorials/packaging-projects/

0 commit comments

Comments
 (0)