My tutorial about make and bash for data science. This repository was created to provide all the resources and the environment that I use in my talk about make and bash for R Users.
A written down tutorial is in progress and will be published on my Blog (URL will be provided here).
This tutorial was designed for use on Debian based Linux systems
To run the Virtual Machines you need
- Virtual Box (Tested on version 4.3.10)
- Vagrant (Tested on version 1.7.2)
- Make sure you virtualization is enabled in you BIOS (usually it is, but if you get an error, this might be a reason)
Make sure your make is compatible with GNU make (which is usually the case on Debian).
As a database server I use PostgreSQL.
Furthermore you need R and the following R packages:
- data.table
- dplyr
- magrittr
- optparse
- readr
- RPostgreSQL
- devtools
- wakefield
To setup the environment run:
sh prepare.sh
This should set up the Virtual Machines (4 server), generate some random data, then push the data to the servers. It will also create some ssh-config for the VMs and create a PostgreSQL database named datakraken.
Now to see make in process change run:
cd tutorial
make -j 4 build
Don't forget to halt the VMs when you are finished. Just cd
into vm and run vagrant halt
.
FreeBSD License