- Sake is your go-to, modular machine learning framework for Budget Execution data analysis built in Python with Scikit, XGBoost, PyTorch, and TensorFlow. Designed for rapid experimentation, visualization, and benchmarking of both classification and regression models, it provides a structured yet extensible workflow that’s equally useful for teaching, prototyping, and real-world application development.
- File A (Account Balances) published monthly by agencies on USASpending
- Required by the DATA Act.
- Pulled automatically from data in the Governmentwide Treasury Account Symbol Adjusted Trial Balance System (GTAS)
- Contains Budgetary resources, obligation, and outlay data for all the relevant Treasury Account Symbols (TAS) in a reporting agency.
- It includes both award and non-award spending (grouped together), and crosswalks with the SF 133 report.
Easily run multiple models through a single function train_and_evaluate()
, which handles:
- Model training
- Accuracy computation
- Confusion matrix generation (for classifiers)
- Performance reporting (classification or regression metrics)
Out-of-the-box support for both:
- Classification models such as Logistic Regression, SVM, Random Forest, XGBoost
- Regression models such as Linear Regression, Ridge, SVR, Gradient Boosting
- Heatmaps of confusion matrices
- Auto-generated
classification_report
with precision, recall, F1-score - Regression summary with metrics like MAE, MSE, R²
- Tabular performance summary across all models
- Use default Scikit-Learn datasets or plug in your own CSV
- Built-in support for label encoding and numeric feature conversion
- Easy integration with Pandas for pre-processing pipelines
- Expandable with PyTorch and TensorFlow architectures
- Importable modules for CNNs, RNNs, and Transformers
- Ideal for teaching ML fundamentals in a comparative format
- Benchmarking for internal ML pipelines and research reproducibility
Model | Module |
---|---|
Logistic Regression | sklearn.linear_model |
Support Vector Machine | sklearn.svm |
Decision Tree | sklearn.tree |
Random Forest | sklearn.ensemble |
k-Nearest Neighbors | sklearn.neighbors |
Gaussian Naive Bayes | sklearn.naive_bayes |
XGBoost Classifier | xgboost.XGBClassifier |
Model | Module |
---|---|
Linear Regression | sklearn.linear_model.LinearRegression |
Ridge Regression | sklearn.linear_model.Ridge |
Support Vector Regressor | sklearn.svm.SVR |
Decision Tree Regressor | sklearn.tree.DecisionTreeRegressor |
Random Forest Regressor | sklearn.ensemble.RandomForestRegressor |
Gradient Boosting Regressor | sklearn.ensemble.GradientBoostingRegressor |
k-NN Regressor | sklearn.neighbors.KNeighborsRegressor |
Package | Description | Link |
---|---|---|
numpy | Numerical computing library | numpy.org |
pandas | Data manipulation and DataFrames | pandas.pydata.org |
matplotlib | Plotting and visualization | matplotlib.org |
seaborn | Statistical data visualization | seaborn.pydata.org |
scikit-learn | ML modeling and metrics | scikit-learn.org |
xgboost | Gradient boosting framework (optional) | xgboost.readthedocs.io |
torch | PyTorch deep learning library | pytorch.org |
tensorflow | End-to-end ML platform | tensorflow.org |
openai | OpenAI’s Python API client | openai-python |
requests | HTTP requests for API and web access | requests.readthedocs.io |
PySimpleGUI | GUI framework for desktop apps | pysimplegui.readthedocs.io |
typing | Type hinting standard library | typing Docs |
pyodbc | ODBC database connector | pyodbc GitHub |
fitz | PDF document parser via PyMuPDF | pymupdf |
pillow | Image processing library | python-pillow.org |
openpyxl | Excel file processing | openpyxl Docs |
soundfile | Read/write sound file formats | pysoundfile |
sounddevice | Audio I/O interface | sounddevice Docs |
loguru | Structured, elegant logging | loguru GitHub |
statsmodels | Statistical tests and regression diagnostics | statsmodels.org |
dotenv | Load environment variables from .env |
python-dotenv GitHub |
python-dotenv | Same as above (modern usage) | python-dotenv |
git clone https://github.com/your-username/balance-projector.git
cd balance-projector
pip install -r requirements.txt
jupyter notebook balances.ipynb
Replace dataset ingestion cell with:
import pandas as pd
df = pd.read_csv("your_dataset.csv")
X = df.drop("target_column", axis=1)
y = df["target_column"]
- R², MAE, MSE for each model
- Bar plots of performance scores
- Visual predicted vs. actual scatter charts
- Residual error analysis
- Add time series models (Prophet, ARIMA)
- Integrate GridSearchCV for model tuning
- SHAP-based interpretability
- Flask/FastAPI API for deploying forecasts
- LLM summarization of forecast outcomes
- 🍴 Fork the project
- 🔧 Create a branch:
git checkout -b feat/new-feature
- ✅ Commit and push changes
- 📬 Submit a pull request
This project is licensed under the MIT License.