Steps:
- Make sure you 'git pull' the latest changes (from October 15, 2018), including changes in ../../data_extraction.
- cd to
../../data_extraction
and type make. This will create the multi-reference file used by the metrics (../../data_extraction/test.refs
). - Install 3rd party software as instructed below (METEOR and mteval-v14c.pl).
- Run the following command, where
[SUBMISSION]
is the submission file you want to evaluate: (same format as the one you submitted on Oct 8.)
python dstc.py -c [SUBMISSION] --refs ../../data_extraction/test.refs
Important: the results printed by dstc.py might differ slightly from the official results, if part of your test set failed to download.
(Based on this repo by Sean Xiang Gao)
- evaluation: calculate automated NLP metrics (BLEU, NIST, METEOR, entropy, etc...)
from metrics import nlp_metrics
nist, bleu, meteor, entropy, diversity, avg_len = nlp_metrics(
path_refs=["demo/ref0.txt", "demo/ref1.txt"],
path_hyp="demo/hyp.txt")
# nist = [1.8338, 2.0838, 2.1949, 2.1949]
# bleu = [0.4667, 0.441, 0.4017, 0.3224]
# meteor = 0.2832
# entropy = [2.5232, 2.4849, 2.1972, 1.7918]
# diversity = [0.8667, 1.000]
# avg_len = 5.0000
- tokenization: clean string and deal with punctation, contraction, url, mention, tag, etc
from tokenizers import clean_str
s = " I don't know:). how about this?https://github.com"
clean_str(s)
# i do n't know :) . how about this ? __url__
- Works fine for both Python 2.7 and 3.6
- Please downloads the following 3rd-party packages and save in a new folder
3rdparty
:- mteval-v14c.pl to compute NIST. You may need to install the following perl modules (e.g. by
cpan install
): XML:Twig, Sort:Naturally and String:Util. - meteor-1.5 to compute METEOR. It requires Java.
- mteval-v14c.pl to compute NIST. You may need to install the following perl modules (e.g. by