Skip to content

Sake is your go-to, modular machine learning framework for Budget Execution data analysis built in Python with Scikit, XGBoost, PyTorch, and TensorFlow.

License

Notifications You must be signed in to change notification settings

is-leeroy-jenkins/Sake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sake

  • Sake is your go-to, modular machine learning framework for Budget Execution data analysis built in Python with Scikit, XGBoost, PyTorch, and TensorFlow. Designed for rapid experimentation, visualization, and benchmarking of both classification and regression models, it provides a structured yet extensible workflow that’s equally useful for teaching, prototyping, and real-world application development.
Open In Colab

🔬 Data Source

🚀 Features

🔄 Unified Evaluation Pipeline

Easily run multiple models through a single function train_and_evaluate(), which handles:

  • Model training
  • Accuracy computation
  • Confusion matrix generation (for classifiers)
  • Performance reporting (classification or regression metrics)

🧠 Dual Model Support

Out-of-the-box support for both:

  • Classification models such as Logistic Regression, SVM, Random Forest, XGBoost
  • Regression models such as Linear Regression, Ridge, SVR, Gradient Boosting

📊 Visual Performance Reports

  • Heatmaps of confusion matrices
  • Auto-generated classification_report with precision, recall, F1-score
  • Regression summary with metrics like MAE, MSE, R²
  • Tabular performance summary across all models

📁 Custom Dataset Integration

  • Use default Scikit-Learn datasets or plug in your own CSV
  • Built-in support for label encoding and numeric feature conversion
  • Easy integration with Pandas for pre-processing pipelines

🧠 Deep Learning Ready

  • Expandable with PyTorch and TensorFlow architectures
  • Importable modules for CNNs, RNNs, and Transformers

🧪 Educational & Research Utility

  • Ideal for teaching ML fundamentals in a comparative format
  • Benchmarking for internal ML pipelines and research reproducibility

🧠 Classification Models

Model Module
Logistic Regression sklearn.linear_model
Support Vector Machine sklearn.svm
Decision Tree sklearn.tree
Random Forest sklearn.ensemble
k-Nearest Neighbors sklearn.neighbors
Gaussian Naive Bayes sklearn.naive_bayes
XGBoost Classifier xgboost.XGBClassifier

📉 Regression Models

Model Module
Linear Regression sklearn.linear_model.LinearRegression
Ridge Regression sklearn.linear_model.Ridge
Support Vector Regressor sklearn.svm.SVR
Decision Tree Regressor sklearn.tree.DecisionTreeRegressor
Random Forest Regressor sklearn.ensemble.RandomForestRegressor
Gradient Boosting Regressor sklearn.ensemble.GradientBoostingRegressor
k-NN Regressor sklearn.neighbors.KNeighborsRegressor

📦 Dependencies

Package Description Link
numpy Numerical computing library numpy.org
pandas Data manipulation and DataFrames pandas.pydata.org
matplotlib Plotting and visualization matplotlib.org
seaborn Statistical data visualization seaborn.pydata.org
scikit-learn ML modeling and metrics scikit-learn.org
xgboost Gradient boosting framework (optional) xgboost.readthedocs.io
torch PyTorch deep learning library pytorch.org
tensorflow End-to-end ML platform tensorflow.org
openai OpenAI’s Python API client openai-python
requests HTTP requests for API and web access requests.readthedocs.io
PySimpleGUI GUI framework for desktop apps pysimplegui.readthedocs.io
typing Type hinting standard library typing Docs
pyodbc ODBC database connector pyodbc GitHub
fitz PDF document parser via PyMuPDF pymupdf
pillow Image processing library python-pillow.org
openpyxl Excel file processing openpyxl Docs
soundfile Read/write sound file formats pysoundfile
sounddevice Audio I/O interface sounddevice Docs
loguru Structured, elegant logging loguru GitHub
statsmodels Statistical tests and regression diagnostics statsmodels.org
dotenv Load environment variables from .env python-dotenv GitHub
python-dotenv Same as above (modern usage) python-dotenv

🧪 How to Run

git clone https://github.com/your-username/balance-projector.git
cd balance-projector
pip install -r requirements.txt
jupyter notebook balances.ipynb

📁 Customize Dataset

Replace dataset ingestion cell with:

import pandas as pd
df = pd.read_csv("your_dataset.csv")
X = df.drop("target_column", axis=1)
y = df["target_column"]

📊 Outputs

  • R², MAE, MSE for each model
  • Bar plots of performance scores
  • Visual predicted vs. actual scatter charts
  • Residual error analysis

🔮 Roadmap

  • Add time series models (Prophet, ARIMA)
  • Integrate GridSearchCV for model tuning
  • SHAP-based interpretability
  • Flask/FastAPI API for deploying forecasts
  • LLM summarization of forecast outcomes

🤝 Contributing

  1. 🍴 Fork the project
  2. 🔧 Create a branch: git checkout -b feat/new-feature
  3. ✅ Commit and push changes
  4. 📬 Submit a pull request

📜 License

This project is licensed under the MIT License.


About

Sake is your go-to, modular machine learning framework for Budget Execution data analysis built in Python with Scikit, XGBoost, PyTorch, and TensorFlow.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published