ML-in-C is a project aimed at enhancing machine learning capabilities using the C programming language. This repository contains implementations of machine learning models, such as Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs), both in C and Python. The project focuses on performance optimization, leveraging CPU and GPU computations using OpenMP, CUDA, and providing benchmarking scripts for performance comparison.
- Machine Learning Models in C: Implementations of MLP and CNN models optimized for performance.
- CUDA Support: GPU acceleration using CUDA for intensive computations.
- Python Implementations: Equivalent models in Python for ease of use and comparison.
- Data Loading and Preprocessing: Scripts to download and preprocess popular datasets.
- Benchmarking Scripts: Tools to compare performance between CPU and GPU implementations.
- Unit Testing: Testing framework for ensuring code correctness and reliability.
├── data
│ ├── iris.data
│ ├── iris_processed.txt
│ ├── wdbc.data
│ ├── winequality-red.csv
│ └── winequality-white.csv
├── docs
├── examples
│ ├── c
│ └── python
├── figs
├── LICENSE
├── README.md
└── src
├── c
│ ├── CMakeLists.txt
│ ├── data_loader.c
│ ├── data_loader.h
│ ├── main.c
│ └── models
│ ├── cnn
│ └── mlp
│ ├── CMakeLists.txt
│ ├── mlp_cpu.c
│ ├── mlp.cu
│ └── mlp.h
├── python
│ ├── models
│ │ ├── cnn
│ │ └── mlp
│ │ └── mlp.py
│ ├── setup.py
│ └── utils
├── scripts
│ ├── build_run_c_CPU.sh
│ ├── download_datasets.sh
│ ├── preprocess_iris.py
│ └── run_pipeline.sh
└── tests
├── c
└── python
- data/: Contains datasets and preprocessed data files.
- docs/: Documentation files.
- examples/: Example programs in C and Python.
- figs/: Figures and images for documentation or results.
- src/: Source code for the project.
- c/: C implementations.
- models/: Machine learning models in C.
- mlp/: Multi-Layer Perceptron implementations.
- cnn/: Convolutional Neural Network implementations.
- models/: Machine learning models in C.
- python/: Python implementations.
- models/: Machine learning models in Python.
- scripts/: Helper scripts for building, running, and benchmarking.
- tests/: Unit tests for C and Python code.
- c/: C implementations.
- C Compiler: GCC or any C99-compatible compiler.
- CUDA Toolkit: Required for GPU acceleration (if building with CUDA support).
- CMake: Version 3.10 or higher.
- Python 3: For running preprocessing scripts and Python implementations.
- Python Packages: e.g. pytorch etc. (can be installed via requirements.txt). However, I have not made Python versions yet.
- Clone the Repositiory:
git clone https://github.com/JohannesBroens/ML-in-C.git
cd ML-in-C
- Install Dependencies:
- For C code:
- Ensure that you have a C compiler and CUDA Toolkit installed.
- For Python code:
- Install required Python packages:
pip install -r requirements.txt
- Install required Python packages:
- For C code:
- Build the C Project:
cd src/c
mkdir build
cd build
cmake .. -DUSE_CUDA=ON
make
- Set
-DUSE_CUDA=OFF
if you want to build without CUDA support.
A pipeline script is provided to automate the process of downloading datasets, preprocessing, building the project, and running the program.
cd src/scripts
./run_pipeline.sh
- Note: Ensure the script has execute permissions:
chmod +x run_pipeline.sh
You can run the program on specific datasets by providing the dataset name as an argument.
./run_pipeline.sh iris
Supported datasets:
generated
iris
wine-red
wine-white
breast-cancer
Not yet implemented.
Not yet implemented.
Not fully documented yet. However, it is the plan to do the following:
- Generating Code Documentation:
- C Code: Documentation generated using Doxygen.
doxygen Doxyfile
- Python Code: Documentation generated using Sphinx.
cd docs make html
- C Code: Documentation generated using Doxygen.
- README.md:
- Contains instructions on how to build and run the code, including dependencies and prerequisites.
Contributions are welcome! Please fork the repository and submit a pull request with your improvements.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/YourFeature
) - Commit your Changes (
git commit -m 'Add YourFeature'
) - Push to the Branch (
git push origin feature/YourFeature
) - Open a Pull Request
This project is licensed under the Apache License (Version 2.0) - see the LICENSE file for details.
Feel free to explore the repository, run the models, and contribute to the project. If you encounter any issues or have suggestions, please open an issue on GitHub.