kaspersgit
diff --git a/‎.github/workflows/run_test.yml ‎.github/workflows/ci.yml
+10-12 b/‎.github/workflows/run_test.yml ‎.github/workflows/ci.yml
+10-12
diff --git a/‎.pre-commit-config.yaml
+1-1 b/‎.pre-commit-config.yaml
+1-1
diff --git a/‎CHANGELOG.md
+37 b/‎CHANGELOG.md
+37
diff --git a/‎MANIFEST.in
+1 b/‎MANIFEST.in
+1
diff --git a/‎Readme.md
+13-34 b/‎Readme.md
+13-34
diff --git a/‎build.sh
+19 b/‎build.sh
+19
diff --git a/‎check_model.py
-92 b/‎check_model.py
-92
diff --git a/‎docs/TODO.md
+1 b/‎docs/TODO.md
+1
diff --git a/‎docs/devReadMe.md
+12-3 b/‎docs/devReadMe.md
+12-3
@@ -1,4 +1,4 @@
-name: Run Tests via Pytest on Linux, Unix and Windows
+name: CI
 
 on: [push]  
 
@@ -19,29 +19,27 @@ jobs:
       - name:  Install dependencies ${{ matrix.os }}
         run: |
               python -m pip install --upgrade pip  
-              python -m venv .ml2sql
-              if [ "$RUNNER_OS" == "Windows" ]; then
-                    ".ml2sql\Scripts\python" -m pip install --index-url https://pypi.org/simple -r "docs\requirements-dev.txt"
-              else
-                    .ml2sql/bin/python -m pip install --index-url https://pypi.org/simple -r docs/requirements-dev.txt  
-              fi
+              python -m pip install -e .
+        shell: bash
+      - name: Install development dependencies
+        run: |
+              python -m pip install ".[dev]"
         shell: bash
       - name: Lint with Ruff  
         run: |
               if [ "$RUNNER_OS" == "Windows" ]; then
-                    ".ml2sql\Scripts\ruff" check --output-format=github .
+                    "ruff" check --output-format=github .
               else
-                    .ml2sql/bin/ruff check --output-format=github .
+                    ruff check --output-format=github .
               fi  
         shell: bash 
         continue-on-error: true  
       - name: Test with pytest
         run: |  
               if [ "$RUNNER_OS" == "Windows" ]; then
-                    ".ml2sql\Scripts\pytest" -v -k "not _script and not test_pre_process_kfold"
+                    python -m "pytest" -k "not test_run and not test_check_model and not test_pre_process_kfold"
               else
-                    source .ml2sql/bin/activate
-                    coverage run -m pytest -v
+                    python -m "pytest"
               fi  
         shell: bash 
 
@@ -11,7 +11,7 @@ repos:
   hooks:
     - id: pytest-check
       name: pytest-check
-      entry: pytest
+      entry: python -m "pytest"
       language: system
       pass_filenames: false
       always_run: true
@@ -0,0 +1,37 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [Unreleased]
+
+### Added
+- Added Kmeans (unsupervised) as model to choose
+- Auto test created SQL model vs pickled model
+
+### Changed
+- For changes in existing functionality.
+
+### Deprecated
+- For soon-to-be removed features.
+
+### Removed
+- For now removed features.
+
+### Fixed
+- For any bug fixes.
+
+### Security
+- In case of vulnerabilities.
+
+## [0.1.2] - 2024-06-25
+
+### Added
+- Initial release of the package.
+- Use as command line tool (commands: init, run, check-model and clean-data)
+- Automatic ML model training
+- Outputting several performance graphs
+- Saves model in .sav format and in .sql format
+
@@ -0,0 +1 @@
+include ml2sql/input/data/*
@@ -7,7 +7,7 @@
 
 # Table of Contents
 
-<img src="docs/media/ml2sql_logo.png" align="right"
+<img src="https://github.com/kaspersgit/ml_2_sql/blob/main/docs/media/ml2sql_logo.png?raw=true" align="right"
      alt="ML2SQL">
 
 1. [What is it?](#what-is-it)
@@ -20,10 +20,10 @@
 <br>
 
 # What is it?
-An automated machine learning tool which trains, graphs performance and saves the model in SQL. Using interpretable ML models (from [interpretml](https://github.com/interpretml/interpret/)) to train models which are explainable and interpretable, so called 'glassbox' models. With the outputted model in SQL format which can be used to put a model in 'production' in an SQL environment.
+An automated machine learning cli tool which trains, graphs performance and saves the model in SQL. Using interpretable ML models (from [interpretml](https://github.com/interpretml/interpret/)) to train models which are explainable and interpretable, so called 'glassbox' models. With the outputted model in SQL format which can be used to put a model in 'production' in an SQL environment.
 This tool can be used by anybody, but is aimed for people who want to do a quick analysis and/or deploy a model in an SQL system. 
 
-<center><img src="docs/media/ml2sql_demo.gif"
+<center><img src="https://github.com/kaspersgit/ml_2_sql/blob/main/docs/media/ml2sql_demo.gif?raw=true"
      alt="ML2SQL_demo" height=400 width=600></center>
 
 ## Philosophy:
@@ -46,30 +46,13 @@ This tool can be used by anybody, but is aimed for people who want to do a quick
 
 # Getting started
 <details> 
-<summary><strong>Installation</strong></summary>
+<summary><strong>Set up</strong></summary>
 <br>
 
-  1. Make sure you have python >= 3.8 and git installed
-  2. Clone Github repo to your local machine and cd into folder, run:
-      ```
-      git clone [email protected]:kaspersgit/ml_2_sql.git
-      cd ml_2_sql
-      ```
-  3. Create virtual environment and install packages, run: 
-        
-        Windows:
-        ```
-        python -m venv .ml2sql
-        .ml2sql/Scripts/python -m pip install -r docs/requirements.txt
-        ```
-        
-        Mac/Linux:
-        ```
-        python3 -m venv .ml2sql
-        .ml2sql/bin/python -m pip install -r docs/requirements.txt
-        ```
-  4. Wait until all packages are installed (could take a few minutes)
-  5. You are ready to go (the virtual env does not need to be activated to use this tool)
+  1. Make sure you have python >= 3.8
+  2. `pip install ml2sql`
+  3. Run: `ml2sql init` 
+    This will create the folders, `input/data/`, `input/configuration/ and `trained_models/` 
 
 <br>
 </details> 
@@ -78,9 +61,7 @@ This tool can be used by anybody, but is aimed for people who want to do a quick
 <br>
 
   1. In the terminal in the root of this folder run: 
-      - `python3 run.py` (Mac/Linux)
-      - `python run.py` (Windows)
-  2. Follow the instructions on screen by selecting the example data and similarly named config file
+    `ml2sql run`, follow the instructions on screen and select the demo data and config
   3. Check the output in the newly created folder
 
 <br>
@@ -90,9 +71,7 @@ This tool can be used by anybody, but is aimed for people who want to do a quick
 <br>
 
   1. Save csv file containing target and all features in the `input/data/` folder (more info on [input data](#data))
-  2. In the terminal in the root of this folder run: 
-      - `python3 run.py` (Mac/Linux)
-      - `python run.py` (Windows)
+  2. Run: `ml2sql run`
   3. Select your CSV file
   4. Select `Create a new config` and choose `Automatic` option (a config file will be made and can be edited later) (more info on [config json](#configuration-json))
   5. Select newly created config
@@ -110,8 +89,7 @@ This tool can be used by anybody, but is aimed for people who want to do a quick
   1. Make sure the new dataset has the same variables as the dataset the model was trained on (same features and target)
   2. Save dataset in the `input/data/` folder (more info on [input data](#data))
   3. In the terminal in the root of this folder run: 
-      - `python3 check_model.py` (Mac/Linux)
-      - `python check_model.py` (Windows)
+     `ml2sql check-model`
   4. Follow the instructions on screen
   5. The output will be saved in the folder `trained_models/<selected_model>/tested_datasets/<selected_dataset>/`
 
@@ -256,12 +234,13 @@ Can be found in the created model's folder under `/model`
 
 ## Notes
 - Limited to 3 models (EBM, linear/logistic regression, and Decision Tree).
-- Data imbalance treatments (e.g., oversampling + model calibration) are not fully implemented.
+- Data imbalance treatments (e.g., oversampling + model calibration) are not implemented.
 - Only accepts CSV files.
 - Interactions with more than 2 variables are not supported.
 
 ## TODO list
 Check docs/TODO.md for an extensive list of planned features and improvements.
+Feel free to open an issue in case a feature is missing or not working properly.
 
 # Troubleshooting
 If you encounter an unclear error message after following the instructions above, feel free to create an Issue on the GitHub repository.
@@ -0,0 +1,19 @@
+#!/bin/bash
+
+# Ensure we exit immediately if any command fails
+set -e
+
+# Clean up old distribution files
+echo "Cleaning up old distribution files..."
+rm -rf dist/*
+
+# Create source distribution and wheel
+echo "Building source distribution and wheel..."
+python -m build
+
+# Optional: Run checks
+echo "Running twine check..."
+twine check dist/*
+
+echo "Build complete. Distribution files are in the 'dist/' directory."
+echo "To upload to PyPI, run: twine upload dist/*"
@@ -19,6 +19,7 @@ Checks and config
 
 - Other 
   - Allow for other data file types (apart from csv)
+  - Test generated SQL vs trained model and report on difference
   - Switch decision tree from sklearn to interpret for coherence (wait on [issue 552](https://github.com/interpretml/interpret/issues/522))
   - Add calibration (platt scaling/isotonic regression)
   - Add changelog and versioning
 
@@ -3,13 +3,19 @@
 Mac/Linux:
 ```
 python3 -m venv .ml2sql
-.ml2sql/bin/python -m pip install --index-url https://pypi.org/simple -r docs/requirements-dev.txt
+source .ml2sql/bin/activate
+python -m pip install --index-url https://pypi.org/simple \
+    -r docs/requirements-dev.txt \
+    -e .  # <- the app/pkg itself
 ```
 
 Windows 
 ```
 python -m venv .ml2sql
-.ml2sql/Scripts/python -m pip install --index-url https://pypi.org/simple -r docs/requirements-dev.txt
+.ml2sql/Script/activate
+python -m pip install --index-url https://pypi.org/simple \
+    -r docs/requirements-dev.txt \
+    -e .  # <- the app/pkg itself
 ```
 
 ### Testing
@@ -20,4 +26,7 @@ python -m venv .ml2sql
 With the virtual env activated
 - Compile user requirements.txt file: `python -m piptools compile --index-url=https://pypi.org/simple -o docs/requirements.txt pyproject.toml`
 - Compile dev requirements-dev.txt file: `python -m piptools compile --index-url=https://pypi.org/simple --extra dev -o docs/requirements-dev.txt -c docs/requirements.txt pyproject.toml`
-  (Making sure packages in both files have the same version, [stackoverflow source](https://stackoverflow.com/questions/76055688/generate-aligned-requirements-txt-and-dev-requirements-txt-with-pip-compile))
+  (Making sure packages in both files have the same version, [stackoverflow source](https://stackoverflow.com/questions/76055688/generate-aligned-requirements-txt-and-dev-requirements-txt-with-pip-compile))
+
+### Building package
+https://packaging.python.org/en/latest/tutorials/packaging-projects/