This code has been developed using a virtualenv with a numpy install.
- Create a python virtualenv
source [your-virtual-env]/bin/activate
pip install numpy
pip install cython
python setup.py build_ext --inplace
- run commands
Note: cython
is needed for processing over complex user and interface models.
Complex interface models necessitate looping over all submitted updates; looping over lists is very slow in pure Python.
- Temporal Summarization 2013 qrels (present in
data/ts-2013/qrels
) - Temporal Summarization 2013 submitted runs (download from TREC into
data/ts-2013/submitted-runs
). - Lengths of all sentences submitted to TS 2013 (download from here into
data/ts-2013/update-lengths
). - Temporal Summarization 2014 submitted runs (download from TREC into
data/ts-2014/submitted-runs
). - Lengths of all sentences submitted to TS 2014 (download from here into
data/ts-2014/update-lengths
).
msu-2016
: Main codebase
├── Readme.md
: This Readme.md
├── modeled_stream_utility.py
: main script
├── nugget.py
: nugget class for Temporal Summarization tracks
├── update.py
: update class for sentences submitted to Temporal Summarization tracks
├── get_query_durations.py
: extracts start and end timestamps for query durations from the tracks' topics.xml file
├── probability_distributions.py
: base classes for probability distributions
├── population_model.py
: user population model
├── user_model.py
: user behavior model
├── user_interface_model.py
: user interface models
├── utils.py
: Utility functions
Cython
-ic files for complex user interface models
├── cython_computations.pyx
: defines a custom heap class and computes msu for ranked interfaces
├── setup.py
: builds cython_computations
library for importing into python code
Files for comparing msu-2016 and sigir-2015 codebases (see codebase-comparison)
├── modeled_stream_utility_with_time_trails.py
: script to evaluate using time trails made by R (see sigir-2015/Readme.md )
└── gen-pythonic-time-trails.py
: script to compute MSU given user time-trails generated by sigir-2015/generate.time.trails.R
cd msu-2016;
python modeled_stream_utility.py ts13 ../data/ts-2013/qrels/matches.tsv ../data/ts-2013/qrels/nuggets.tsv ../data/ts-2013/qrels/pooled_updates.tsv ../data/ts-2013/qrels/topics_masked.xml ../data/ts-2013/update-lengths/ 1000 120 60 10800 5400 0.5 ../data/ts-2013/submitted-runs/input.* > msu2016-code.ts2013.results.all
cat msu2016-code.ts2013.results.all | grep AVG | gawk '{print $1, $3}' | sed 's_^input.__g' > msu2016-code.ts2013.results.avg
The msu2016-code.ts2013.results.avg
file will contain the average MSU scores for the TS 2013 runs; MSU is computed for each system by simulating 1000 "reasonable" users with the user population having the parameters 2 ± 1 minute for reading sessions and 3 ± 1.5 hours spent away from the system with a latenes decay parameter 0.5.
cd msu-2016;
python modeled_stream_utility.py ts14 ../data/ts-2014/qrels/matches.tsv ../data/ts-2014/qrels/nuggets.tsv ../data/ts-2014/qrels/updates_sampled.tsv ../data/ts-2014/qrels/trec2014-ts-topics-test.xml ../data/ts-2014/update-lengths/ 1000 120 60 10800 5400 0.5 ../data/ts-2014/submitted-runs/* > msu2016-code.ts2014.results.all
modeled_stream_utility_ranked_order.py
: We compute here MSU over an interface that presents users with ranked updates presented one at a time; users are assumed to follow the Rank Biased Precision user model; updates older than one day are removed from further consideration.
We compare here the results generated by this new all-Python
-ic code vs. the R+Python
code developed for the MSU paper.
-
First we follow the proceduure outlined in sigir-2015/Readme.md and generate user time-trails and results for all runs of TS 2013 track. This generates a file
../data/ts-2013/simulation-data/0.mean.metrics
as the output (Note that multiple MSU derived metrics are also reported in this file.) -
we then
``` cd msu-2016; python modeled_stream_utility_with_time_trails.py ../data/ts-2013/qrels/matches.tsv ../data/ts-2013/qrels/nuggets.tsv ../data/ts-2013/qrels/pooled_updates.tsv ../data/ts-2013/topic_query_durations ../data/ts-2013/update-lengths/ 1000 ../data/ts-2013/simulation-data/0.user.params ../data/ts-2013/simulation-data/0.time-trails/ 0.5 ../data/ts-2013/submitted-runs/input.* | grep AVG | sed 's_^input.__g' | gawk -v OFS="\t" '{print $1, $3}' > new.code.results ```
-
get the sigir-2015 code's results
``` gawk -v OFS=O"\t" '(NR>1){print $1, $2}' ../data/ts-2013/simulation-data/0.mean.metrics > old.code.results ```
-
Comparing the old code results and the new code results results in no differences when using the R generated time-trails. This indicates that the MSU computation part of the old and new codes produce the same results.
``` $ diff old.code.results new.code.results $ ```
-
HOWEVER, using the new all-
Python
-ic code produces difference absolute system scores. This we can attribute to the difference in the sampling processes ofPython-numpy
andR
. Though the underlying probability distributions are parameterized identically, the sampled random deviates do differ, causing a slight change in respective absolute system scores. Note that the systems ranking does not significantly change with the new code (Kendall's tau > 0.987 betweenmsu-2016
andsigir-2015
code bases). The RMSE between the absolute scores produced by both code bases is 0.027.