GitHub - is-leeroy-jenkins/Sake: Sake is your go-to, modular machine learning framework for Budget Execution data analysis built in Python with Scikit, XGBoost, PyTorch, and TensorFlow.

Sake

Sake is your go-to, modular machine learning framework for Budget Execution data analysis built in Python with Scikit, XGBoost, PyTorch, and TensorFlow. Designed for rapid experimentation, visualization, and benchmarking of both classification and regression models, it provides a structured yet extensible workflow that’s equally useful for teaching, prototyping, and real-world application development.

🔬 Data Source

File A (Account Balances) published monthly by agencies on USASpending
Required by the DATA Act.
Pulled automatically from data in the Governmentwide Treasury Account Symbol Adjusted Trial Balance System (GTAS)
Contains Budgetary resources, obligation, and outlay data for all the relevant Treasury Account Symbols (TAS) in a reporting agency.
It includes both award and non-award spending (grouped together), and crosswalks with the SF 133 report.

🚀 Features

🔄 Unified Evaluation Pipeline

Easily run multiple models through a single function train_and_evaluate(), which handles:

Model training
Accuracy computation
Confusion matrix generation (for classifiers)
Performance reporting (classification or regression metrics)

🧠 Dual Model Support

Out-of-the-box support for both:

Classification models such as Logistic Regression, SVM, Random Forest, XGBoost
Regression models such as Linear Regression, Ridge, SVR, Gradient Boosting

📊 Visual Performance Reports

Heatmaps of confusion matrices
Auto-generated classification_report with precision, recall, F1-score
Regression summary with metrics like MAE, MSE, R²
Tabular performance summary across all models

📁 Custom Dataset Integration

Use default Scikit-Learn datasets or plug in your own CSV
Built-in support for label encoding and numeric feature conversion
Easy integration with Pandas for pre-processing pipelines

🧠 Deep Learning Ready

Expandable with PyTorch and TensorFlow architectures
Importable modules for CNNs, RNNs, and Transformers

🧪 Educational & Research Utility

Ideal for teaching ML fundamentals in a comparative format
Benchmarking for internal ML pipelines and research reproducibility

🧠 Classification Models

Model	Module
Logistic Regression	`sklearn.linear_model`
Support Vector Machine	`sklearn.svm`
Decision Tree	`sklearn.tree`
Random Forest	`sklearn.ensemble`
k-Nearest Neighbors	`sklearn.neighbors`
Gaussian Naive Bayes	`sklearn.naive_bayes`
XGBoost Classifier	`xgboost.XGBClassifier`

📉 Regression Models

Model	Module
Linear Regression	`sklearn.linear_model.LinearRegression`
Ridge Regression	`sklearn.linear_model.Ridge`
Support Vector Regressor	`sklearn.svm.SVR`
Decision Tree Regressor	`sklearn.tree.DecisionTreeRegressor`
Random Forest Regressor	`sklearn.ensemble.RandomForestRegressor`
Gradient Boosting Regressor	`sklearn.ensemble.GradientBoostingRegressor`
k-NN Regressor	`sklearn.neighbors.KNeighborsRegressor`

📦 Dependencies

Package	Description	Link
numpy	Numerical computing library	numpy.org
pandas	Data manipulation and DataFrames	pandas.pydata.org
matplotlib	Plotting and visualization	matplotlib.org
seaborn	Statistical data visualization	seaborn.pydata.org
scikit-learn	ML modeling and metrics	scikit-learn.org
xgboost	Gradient boosting framework (optional)	xgboost.readthedocs.io
torch	PyTorch deep learning library	pytorch.org
tensorflow	End-to-end ML platform	tensorflow.org
openai	OpenAI’s Python API client	openai-python
requests	HTTP requests for API and web access	requests.readthedocs.io
PySimpleGUI	GUI framework for desktop apps	pysimplegui.readthedocs.io
typing	Type hinting standard library	typing Docs
pyodbc	ODBC database connector	pyodbc GitHub
fitz	PDF document parser via PyMuPDF	pymupdf
pillow	Image processing library	python-pillow.org
openpyxl	Excel file processing	openpyxl Docs
soundfile	Read/write sound file formats	pysoundfile
sounddevice	Audio I/O interface	sounddevice Docs
loguru	Structured, elegant logging	loguru GitHub
statsmodels	Statistical tests and regression diagnostics	statsmodels.org
dotenv	Load environment variables from `.env`	python-dotenv GitHub
python-dotenv	Same as above (modern usage)	python-dotenv

🧪 How to Run

git clone https://github.com/your-username/balance-projector.git
cd balance-projector
pip install -r requirements.txt
jupyter notebook balances.ipynb

📁 Customize Dataset

Replace dataset ingestion cell with:

import pandas as pd
df = pd.read_csv("your_dataset.csv")
X = df.drop("target_column", axis=1)
y = df["target_column"]

📊 Outputs

R², MAE, MSE for each model
Bar plots of performance scores
Visual predicted vs. actual scatter charts
Residual error analysis

🔮 Roadmap

Add time series models (Prophet, ARIMA)
Integrate GridSearchCV for model tuning
SHAP-based interpretability
Flask/FastAPI API for deploying forecasts
LLM summarization of forecast outcomes

🤝 Contributing

🍴 Fork the project
🔧 Create a branch: git checkout -b feat/new-feature
✅ Commit and push changes
📬 Submit a pull request

📜 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.idea		.idea
data		data
resources/assets/img		resources/assets/img
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
models.ipynb		models.ipynb
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sake

🔬 Data Source

🚀 Features

🔄 Unified Evaluation Pipeline

🧠 Dual Model Support

📊 Visual Performance Reports

📁 Custom Dataset Integration

🧠 Deep Learning Ready

🧪 Educational & Research Utility

🧠 Classification Models

📉 Regression Models

📦 Dependencies

🧪 How to Run

📁 Customize Dataset

📊 Outputs

🔮 Roadmap

🤝 Contributing

📜 License

About

Releases

Packages

Languages

License

is-leeroy-jenkins/Sake

Folders and files

Latest commit

History

Repository files navigation

Sake

🔬 Data Source

🚀 Features

🔄 Unified Evaluation Pipeline

🧠 Dual Model Support

📊 Visual Performance Reports

📁 Custom Dataset Integration

🧠 Deep Learning Ready

🧪 Educational & Research Utility

🧠 Classification Models

📉 Regression Models

📦 Dependencies

🧪 How to Run

📁 Customize Dataset

📊 Outputs

🔮 Roadmap

🤝 Contributing

📜 License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages