Add readme to run scripts (#11)

* Add README to run scripts * Added zenodo
dobraczka · Jul 18, 2024 · 2b1cf8c · 2b1cf8c
1 parent 5e35773
commit 2b1cf8c
Show file tree

Hide file tree

Showing 2 changed files with 39 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -88,5 +88,4 @@ micromamba run -n klinker-conda python experiment.py movie-graph-benchmark-datas
 ```
 This would be similar to the steps described in the above usage section.
 
-In order to precisely reproduce the results from the paper we provide (adapted) run scripts from our SLURM batch scripts in the `run_scripts` folder.
-We recommend to `git checkout paper` to checkout out the tagged commit on which the experiments were run since future development does not aim to be backwards compatible with this state.
+In order to precisely reproduce the results from the paper we provide (adapted) run scripts from our SLURM batch scripts in the `run_scripts` folder. Please consult the `run_scripts/README.md` for further information. For archival purposes the experiment artifacts and the source code are stored in [Zenodo](https://zenodo.org/records/12774407).
diff --git a/run_scripts/README.md b/run_scripts/README.md
@@ -0,0 +1,38 @@
+# Installation
+
+In order to reproduce our results clone the repository and checkout the specific tag to get the state at which the experiments where done:
+
+```
+git clone https://github.com/dobraczka/klinker.git
+cd klinker
+git checkout paper
+```
+
+Create a virtual environment with micromamba and install the dependencies:
+
+```
+micromamba env create -n klinker-conda --file=klinker-conda.yaml
+micromamba activate klinker-conda
+pip install -e ".[all]"
+```
+
+# Running the experiments
+We originally used SLURM to run our experiments utilizing SLURM Job arrays. We adapted our code so it can be run without SLURM, but kept the arrays.
+For each embedding based method the entries 0-15 utilize sentence transformer embeddings and 16-31 rely on SIF aggregated fasttext embeddings.
+For the entries 24-31 it is expected, that you have the dimensionality reduced fasttext embeddings in `~/.data/klinker/word_embeddings/100wiki.en.bin`.
+For methods without embeddings (`non_relational/run_token.sh` and `relational/run_relational_token.sh`) only the entries 0-15 exist.
+
+You can reduce the dimensionality of the fasttext embeddings like this:
+```
+import fasttext
+import fasttext.util
+
+ft = fasttext.load_model('wiki.en.bin')
+fasttext.util.reduce_model(ft, 100)
+ft.save_model("~/.data/klinker/word_embeddings/100wiki.en.bin")
+```
+
+The experiments can then be run individually by supplying the wanted entry as first argument, e.g:
+```
+bash run_scripts/relational/run_token_attribute.sh 16
+```