SPIN - Inclusion Dependency Discovery on Streaming Data

Build

Build the project using mvn clean package

Configuration

To configure the algorithm you have two options: providing every argument on the command line or creating a run.params file containing the config which will be passed as an argument.

An exemplary run.params config is the following:

--batch-size=250
--hash-size=32
--masterhost=<MASTERHOST>
--slaves=3
--file-name=<FILEPATH TO DATASET>
--separator=<CSV SEPARATOR> 
--skip-first=<CSV SKIP FIRST LINE>
--needs-prefix=<TRUE IF DATA HAS NO UPDATE TYPE METADATA AS FIRST COLUMN> 
--datastructure-type=<PROBABILISTIC DATASTRUCTURE TO USE>
--bf-capacity=1024
--log2m=16
--hybrid-threshold=1000

In case your data set spans multiple files just keep repeating the --file-name argument for every file of the data set.

On invalid configuration or missing arguments the CLI will also provide help.

Running

To run the SPIN algorithm you need to have one instance of the master system running, and at least one instance of the worker system. Ideally these run on different machines in the same network, if not one has to provide --port and --master-port arguments in the run.params file as well so the systems can connect.

To start the master system use:

java -jar <JARNAME> master @run.params

To start the worker system use:

java -jar <JARNAME> slave @run.params

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPIN - Inclusion Dependency Discovery on Streaming Data

Build

Configuration

Running

About

Releases

Packages

Languages

alpreu/spin

Folders and files

Latest commit

History

Repository files navigation

SPIN - Inclusion Dependency Discovery on Streaming Data

Build

Configuration

Running

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages