ML-in-C

ML-in-C is a project aimed at enhancing machine learning capabilities using the C programming language. This repository contains implementations of machine learning models, such as Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs), both in C and Python. The project focuses on performance optimization, leveraging CPU and GPU computations using OpenMP, CUDA, and providing benchmarking scripts for performance comparison.

Features

Machine Learning Models in C: Implementations of MLP and CNN models optimized for performance.
CUDA Support: GPU acceleration using CUDA for intensive computations.
Python Implementations: Equivalent models in Python for ease of use and comparison.
Data Loading and Preprocessing: Scripts to download and preprocess popular datasets.
Benchmarking Scripts: Tools to compare performance between CPU and GPU implementations.
Unit Testing: Testing framework for ensuring code correctness and reliability.

Folder Structure

├── data
│   ├── iris.data
│   ├── iris_processed.txt
│   ├── wdbc.data
│   ├── winequality-red.csv
│   └── winequality-white.csv
├── docs
├── examples
│   ├── c
│   └── python
├── figs
├── LICENSE
├── README.md
└── src
    ├── c
    │   ├── CMakeLists.txt
    │   ├── data_loader.c
    │   ├── data_loader.h
    │   ├── main.c
    │   └── models
    │       ├── cnn
    │       └── mlp
    │           ├── CMakeLists.txt
    │           ├── mlp_cpu.c
    │           ├── mlp.cu
    │           └── mlp.h
    ├── python
    │   ├── models
    │   │   ├── cnn
    │   │   └── mlp
    │   │       └── mlp.py
    │   ├── setup.py
    │   └── utils
    ├── scripts
    │   ├── build_run_c_CPU.sh
    │   ├── download_datasets.sh
    │   ├── preprocess_iris.py
    │   └── run_pipeline.sh
    └── tests
        ├── c
        └── python

data/: Contains datasets and preprocessed data files.
docs/: Documentation files.
examples/: Example programs in C and Python.
figs/: Figures and images for documentation or results.
src/: Source code for the project.
- c/: C implementations.
  - models/: Machine learning models in C.
    - mlp/: Multi-Layer Perceptron implementations.
    - cnn/: Convolutional Neural Network implementations.
- python/: Python implementations.
  - models/: Machine learning models in Python.
- scripts/: Helper scripts for building, running, and benchmarking.
- tests/: Unit tests for C and Python code.

Getting Started

Prerequisites

C Compiler: GCC or any C99-compatible compiler.
CUDA Toolkit: Required for GPU acceleration (if building with CUDA support).
CMake: Version 3.10 or higher.
Python 3: For running preprocessing scripts and Python implementations.
Python Packages: e.g. pytorch etc. (can be installed via requirements.txt). However, I have not made Python versions yet.

Building the Project

Clone the Repositiory:

git clone https://github.com/JohannesBroens/ML-in-C.git
cd ML-in-C

Install Dependencies:
- For C code:
  - Ensure that you have a C compiler and CUDA Toolkit installed.
- For Python code:
  - Install required Python packages:
```
pip install -r requirements.txt
```
Build the C Project:

cd src/c
mkdir build
cd build
cmake .. -DUSE_CUDA=ON
make

Set -DUSE_CUDA=OFF if you want to build without CUDA support.

Running the Pipeline

A pipeline script is provided to automate the process of downloading datasets, preprocessing, building the project, and running the program.

cd src/scripts
./run_pipeline.sh

Note: Ensure the script has execute permissions:

chmod +x run_pipeline.sh

Usage

Running on Specific Datasets

You can run the program on specific datasets by providing the dataset name as an argument.

./run_pipeline.sh iris

Supported datasets:

generated
iris
wine-red
wine-white
breast-cancer

Benchmarking

Not yet implemented.

Testing

Not yet implemented.

Documentation

Not fully documented yet. However, it is the plan to do the following:

Generating Code Documentation:
- C Code: Documentation generated using Doxygen.
```
doxygen Doxyfile
```
- Python Code: Documentation generated using Sphinx.
```
cd docs
make html
```
README.md:
- Contains instructions on how to build and run the code, including dependencies and prerequisites.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your improvements.

Fork the Project
Create your Feature Branch (git checkout -b feature/YourFeature)
Commit your Changes (git commit -m 'Add YourFeature')
Push to the Branch (git push origin feature/YourFeature)
Open a Pull Request

License

This project is licensed under the Apache License (Version 2.0) - see the LICENSE file for details.

Feel free to explore the repository, run the models, and contribute to the project. If you encounter any issues or have suggestions, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mathematical_foundations.md		mathematical_foundations.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-in-C

Table of Contents

Features

Folder Structure

Getting Started

Prerequisites

Building the Project

Running the Pipeline

Usage

Running on Specific Datasets

Benchmarking

Testing

Documentation

Contributing

License

About

Releases

Packages

Languages

License

JohannesBroens/ML-in-C

Folders and files

Latest commit

History

Repository files navigation

ML-in-C

Table of Contents

Features

Folder Structure

Getting Started

Prerequisites

Building the Project

Running the Pipeline

Usage

Running on Specific Datasets

Benchmarking

Testing

Documentation

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages