Detailed instructions for reproducing

This README is for Artifact Evaluation only, which describes detailed instructions to reproduce experiments we describe in the paper. Before you start this part, we assume you already went through the procedures in the main README file.

It is required that you have a 5-node cluster to reproduce these experiments. The hardware and software dependencies remain same as we described in the main file.

This is a project about distributed systems, thus the following steps can be tedious. We appreciate your patience.

0. Install and configure dependencies

Install and test tmux (~5 min)

We strongly recommend using tmux (a terminal multiplexer) to perform operations and monitor output in parallel. (Another alternative is to use pssh to broadcast commands to all machines) We demonstrate examples with tmux.

We assume the hostnames of 5 nodes are NODE1, NODE2, NODE3, NODE4 and NODE5. Please replace the hostnames in the commands in the following sections with yours.

on NODE1:

sudo apt-get update
sudo apt install tmux

Edit the downloaded tmux_input file and replace all hostnames with your own cluster hostnames. For example:

ssh NODE1 # should be your own hostname

Make sure you can ssh to other nodes from NODE1.

Then run

python tmux_input

If success, you should be able to see screen like this:


Then you can press ctrl+b and type :set synchronize-panes on and press enter. Now your input is in parallel (to exit this mode, change on to be off).

🏁 Unless specified otherwise, all commands in the following sections need to execute across all nodes. Also we suggest at this step you make sure all nodes can ssh to each other which will be later used to execute remote scripts.

Install other dependencies (~3min)

Running the experiments require additional dependencies (golang-go for zk benchmark):

sudo apt-get update
sudo apt-get install -y git maven ant vim openjdk-8-jdk golang-go gnuplot
sudo update-alternatives --set java $(sudo update-alternatives --list java | grep "java-8")

1. Clone ZooKeeper, HDFS, and modify config (~10 min)

Here we essentially repeat the first three steps in Functional guide but instead on a cluster of nodes. Also we extend evaluation to another system HDFS.

git clone [email protected]:apache/zookeeper.git
git clone [email protected]:apache/hadoop.git
git clone
cd OathKeeper
git submodule update --init --recursive

Modify conf/samples/ and change this line to be the absolute path to Zookeeper root dir, for example


Modify conf/samples/ and change this line to be the absolute path to hadoop root dir, for example


Compile OathKeeper (~1 min)

cd OathKeeper && mvn package

Compile Protobuf for HDFS and Add Environment Variable (~5min)

cd hadoop
wget && sudo chmod 755 ./ && ./

add to .bashrc

export PROTOC_HOME=[PATH_TO_HADOOP]/protoc/2.5.0
export HADOOP_PROTOC_PATH=$PROTOC_HOME/dist/bin/protoc
export PATH=$PROTOC_HOME/dist/bin/:$PATH
source ~/.bashrc

This is important step for evaluating on HDFS. If later you see compilation errors protoc not found when compiling HDFS, it is usually due to this is not set properly.

2. Reproduce Sec 9.1 Generation Overview && Sec 9.3 Performance (Offline)

This experiment reproduces the trace generation, rule inference and verification phase on 8 cases of Zookeeper and 10 cases of HDFS. In paper we list the total rule number (Sec 9.1) and execution time (Sec 9.3). Note that since all three phases are non-deterministic execution, the absolute numbers may vary upon each execution. The execution time depends on hardware performance.

Modify config

For ZooKeeper, modify conf/samples/


For HDFS, modify conf/samples/


Execute (~15 h)

nohup bash ./ runall_foreach conf/samples/ &> ./log_zk && bash ./ runall_foreach conf/samples/  &> ./log_hdfs &

We suggest using nohup to run as background process as this takes significant long time. If target system compilation fails, try restarting the execution commands.

One way to speed up is, to divide the files in ${ok_dir}/conf/samples/zk-collections-basic and ${ok_dir}/conf/samples/hdfs-collections-basic in each node. For example, instead of executing for 18 cases on each, keep a few tickets on one node and delete all others. After the execution is over, merge all inv_verify_output in all nodes.

This procedure is very computation-intensive and memory-intensive. If the memory resource is limited, it can take very long time due to heavy GC. To avoid one stuck job blocks other indefinitely, we set a timout threshold of 4 hours.

If you encounter problems when executing and want to kill the process to cleanup, you can use misc/scripts/

Use the following commands to highlight results from log:

sed -n -e '/^\[Profiler\]/p' -e '/^Dumped/p' ./log_zk
sed -n -e '/^\[Profiler\]/p' -e '/^Dumped/p' ./log_hdfs

The results should be kept for later use:

cp -r inv_verify_output inv_verify_output_bk

3. Reproduce Sec 9.2 Checking Newer Violations

3.1 Detect new ones (Online)

This experiment reproduces the runtime detection part (Sec 9.2) result. The steps are similar to what we described in the previous part: install reproducing hooks, install OathKeeper library, start the instance, trigger the failure, check the logs. We automate the reproducing steps as convenient as we can.

The detection relies on the generated invariants from last section.


This case we already covered in the Getting Started part. No need to repeat the execution.


In the paper draft we acknowledged that our tool did not infer useful rules that can detect this case.


Please refer to experiments/notes/


Please refer to experiments/notes/


Please refer to experiments/notes/


Please refer to experiments/notes/

3.2 Crosscheck experiment (ZooKeeper only, Offline)

This experiment reproduces the crosschecking result between 22 ZooKeeper cases.

Modify config

Modify conf/samples/


Generate traces (~4 h)

Crosschecking is based on generated traces. We need to run traces first.

./ runall_foreach conf/samples/

Run comparison (~30 min)

./ crosscheckall_dynamic conf/samples/ conf/samples/zk-collections-cc ./inv_verify_output ./trace_output/ &> log_crosscompare

Generated result is in inv_checktrace_output. For each case if ZK-1208 can detect ZK-1496, it would generate a marker file ZK-1208-ZK-1496_detected, if not that is ZK-1208-ZK-1496_undetected.

Run this command to plot a simple version of Figure 11 in the paper draft.

sed '/Total detected/!d' log_crosscompare | sed -e 's/Total detected ratio: \(.*\)\/23.*/\1/' | gnuplot -p -e 'set terminal png;plot "/dev/stdin"' > Figure11_simple.png

4. Reproduce Sec 9.4 Runtime Overhead && 9.5 Rule Activation and False Positive (Online)

This experiment reproduces the result of other metrics during OathKeeper runtime detection. Note that the overhead is directly affected by the number of loaded invariants (the more rules loaded , the throughput is worse).

4.1 ZooKeeper

Install benchmark and configure (~1 min)

Assume you put OathKeeper root at ~/OathKeeper

cd ~/OathKeeper
mkdir -p ~/go/src/ && cd ~/go/src/ && cp -r ~/OathKeeper/experiments/bench/go-zookeeper ./ && cd ~/OathKeeper && cp -r experiments/bench/zkbench/ ~/go/src/ && cd ~/go/src/zkbench/ && go build

Create a new config called zkbench/bench_perf.conf:

namespace = zkTest
requests = 3000
clients = 15
same_key = false
key_size_bytes = 8
value_size_bytes = 16
type = cmd
cleanup = false

# enable random access
# percents do not have to add up to 1.0
random_access = false
read_percent = 0.4
write_percent = 0.6
runs = 5

# ZooKeeper ensemble, need to change!!

For zookeeper cluster configuration, refer to same settings in ZK-3546 Modify config.

Checkout Zookeeper version and compile (~3 min)

cd zookeeper
git stash && git checkout release-3.6.1
mvn clean package -DskipTests

Install runtime and rules (~1 min)

cd OathKeeper
./ install conf/samples/ zookeeper
rm -rf inv_prod_input && mkdir inv_prod_input && cp -r inv_verify_output_bk/ZK-* inv_prod_input/

Start workload benchmark (~3 min)

Run on all nodes (the script would automatically start the instances):

experiments/run/zookeeper/ [path_to_OathKeeper] [path_to_ZooKeeper] NODE5 ~/go/src/zkbench/

Collect results

For performance, see logs in ~/go/src/zkbench/zkresult-[date]-summary.dat, example:


313.312003 is the throughput on this client.

For active ratio, see logs in zookeeper/logs/zookeeper-*.out, for example:

Checking finished, succCount:.. failCount: .. inactiveCount: ..

Active ratio = 1-inactiveCount/(succCount+failCount+inactiveCount)

Clean up

cd zookeeper
bin/ stop
rm -r /tmp/zookeeper/version-2/

Repeat for baseline result

Repeat above steps but do not install OathKeeper and rules

4.2 HDFS

Checkout and compile (~2 min)

We assume you already configured in HDFS-14699 and will reuse the scripts and config in HDFS-14699.

experiments/reproduce/HDFS-14699/ [path_to_OathKeeper] [path_to_HADOOP]
experiments/run/hdfs/ [path_to_OathKeeper] [path_to_HADOOP] NODE1

Install runtime and rules (~1 min)

./ install conf/samples/ hdfs
rm -rf inv_prod_input && mkdir inv_prod_input && cp -r inv_verify_output_bk/HDFS-* inv_prod_input/

Start workload benchmark (~3 min)

Run on all nodes:

experiments/run/hdfs/ [path_to_OathKeeper] [path_to_HADOOP] NODE1

only run NODE1

experiments/run/hdfs/ [path_to_OathKeeper] [path_to_HADOOP]

The result would be printed out in the terminal like:

Job started: 0
Job ended: ...
The createWrite job took 90 seconds.
The job recorded 0 exceptions.

If so, the throughput is 16000/90=177.77 op/s.

(Sometimes you might encounter benchmark fails to connect, retry cleaning and benchmark usually resolves the problem.)

(If you encounter issues of low resource exceptions and HDFS cluster is forced to enter safe mode, try to reduce benchmark size in the file OathKeeper/experiments/run/hdfs/ or provide more resources.)

Clean up (~1 min)

experiments/run/hdfs/ ~/OathKeeper/ ~/hadoop/ NODE1

Repeat for baseline result

Repeat above steps but do not install OathKeeper and rules