The goal of this work is to develop a tool, named learningOrchestra, to facilitate and streamline the data science iterative process of:
- Gathering data;
- Cleaning/preparing the datasets;
- Building models;
- Validating their predictions; and
- Deploying the results.
The architecture of learningOrchestra is a collection of microservices deployed in a cluster.
A dataset (in CSV format) can be loaded from an URL using the Database API microservice, which converts the dataset to JSON and later stores it in MongoDB.
It is also possible to perform several preprocessing and analytical tasks using learningOrchestra's collection of microservices.
With learningOrchestra, you can build prediction models with different classifiers simultaneously using stored and preprocessed datasets with the Model Builder microservice. This microservice uses a Spark cluster to make prediction models using distributed processing. You can compare the different classification results over time to fit and increase prediction accuracy.
By providing their own preprocessing code, users can create highly customized model predictions against a specific dataset, increasing model prediction accuracy. With that in mind, the possibilities are endless! 🚀
To make using learningOrchestra more accessible, we provide the learning_orchestra_client
Python package. This package provides developers with all of learningOrchestra's functionalities in a Python API.
To improve user experience, a user can export and analyse the results using a MongoDB GUI, such as NoSQLBooster.
We also built a demo of learningOrchestra (in learning_orchestra_client usage example section) with the Titanic challenge dataset.
The learningOrchestra documentation has a more detailed guide on how to install and use it. We also provide documentation and examples for each microservice and Python package.