-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
c3bc8b4
commit 8ddd6f1
Showing
118 changed files
with
144,017 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Dataset | ||
|
||
Please download our dataset at: https://drive.google.com/drive/folders/1YlN6sszyo9vmbjWSQiMN1vr-hHpqSu4O?usp=sharing | ||
|
||
The whole dataset includes the Gitter-based train/test set and open-sourced issue-solution pairs. |
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,213 @@ | ||
import xlrd | ||
import pandas as pd | ||
import matplotlib.pyplot as plt | ||
from scipy import stats | ||
import numpy as np | ||
|
||
|
||
def draw_ablation(): | ||
workbook = xlrd.open_workbook('../data/result_data_new.xlsx') | ||
sheet = workbook.sheet_by_name('Ablation_study') | ||
local_attn_data = sheet.col_values(2, 1, sheet.nrows) | ||
heu_data = sheet.col_values(3, 1, sheet.nrows) | ||
cnn_data = sheet.col_values(4, 1, sheet.nrows) | ||
richa_data = sheet.col_values(5, 1, sheet.nrows) | ||
|
||
pre_issue = [[attn, heu, cnn, richa] for i, (attn, heu, cnn, richa) | ||
in enumerate(zip(local_attn_data, heu_data, cnn_data, richa_data)) if (i + 1) % 6 == 1] | ||
rec_issue = [[attn, heu, cnn, richa] for i, (attn, heu, cnn, richa) | ||
in enumerate(zip(local_attn_data, heu_data, cnn_data, richa_data)) if (i + 1) % 6 == 2] | ||
f1_issue = [[attn, heu, cnn, richa] for i, (attn, heu, cnn, richa) | ||
in enumerate(zip(local_attn_data, heu_data, cnn_data, richa_data)) if (i + 1) % 6 == 3] | ||
pre_solution = [[attn, heu, cnn, richa] for i, (attn, heu, cnn, richa) | ||
in enumerate(zip(local_attn_data, heu_data, cnn_data, richa_data)) if (i + 1) % 6 == 4] | ||
rec_solution = [[attn, heu, cnn, richa] for i, (attn, heu, cnn, richa) | ||
in enumerate(zip(local_attn_data, heu_data, cnn_data, richa_data)) if (i + 1) % 6 == 5] | ||
f1_solution = [[attn, heu, cnn, richa] for i, (attn, heu, cnn, richa) | ||
in enumerate(zip(local_attn_data, heu_data, cnn_data, richa_data)) if (i + 1) % 6 == 0] | ||
df_pre_issue = pd.DataFrame({'richa_localattn': [data[0] for data in pre_issue], | ||
'richa_heu': [data[1] for data in pre_issue], | ||
'richa_cnn': [data[2] for data in pre_issue], | ||
'richa': [data[3] for data in pre_issue]}) | ||
df_rec_issue = pd.DataFrame({'richa_localattn': [data[0] for data in rec_issue], | ||
'richa_heu': [data[1] for data in rec_issue], | ||
'richa_cnn': [data[2] for data in rec_issue], | ||
'richa': [data[3] for data in rec_issue]}) | ||
df_f1_issue = pd.DataFrame({'richa_localattn': [data[0] for data in f1_issue], | ||
'richa_heu': [data[1] for data in f1_issue], | ||
'richa_cnn': [data[2] for data in f1_issue], | ||
'richa': [data[3] for data in f1_issue]}) | ||
df_pre_solution = pd.DataFrame({'richa_localattn': [data[0] for data in pre_solution], | ||
'richa_heu': [data[1] for data in pre_solution], | ||
'richa_cnn': [data[2] for data in pre_solution], | ||
'richa': [data[3] for data in pre_solution]}) | ||
df_rec_solution = pd.DataFrame({'richa_localattn': [data[0] for data in rec_solution], | ||
'richa_heu': [data[1] for data in rec_solution], | ||
'richa_cnn': [data[2] for data in rec_solution], | ||
'richa': [data[3] for data in rec_solution]}) | ||
df_f1_solution = pd.DataFrame({'richa_localattn': [data[0] for data in f1_solution], | ||
'richa_heu': [data[1] for data in f1_solution], | ||
'richa_cnn': [data[2] for data in f1_solution], | ||
'richa': [data[3] for data in f1_solution]}) | ||
x_data = ['P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7', 'P8'] | ||
# plt.plot(x_data, df_pre_issue.richa) | ||
# plt.plot(x_data, df_pre_issue.richa_localattn) | ||
fig = plt.figure() | ||
plt.subplot(231) | ||
plt.plot(x_data, list(df_pre_issue.richa), color='limegreen', linestyle='-', marker='s', markersize=4, | ||
mfcalt='b', label='ISPY') | ||
plt.xticks([]) | ||
plt.plot(x_data, list(df_pre_issue.richa_localattn), color='darksalmon', linestyle='-', marker='x', markersize=4, | ||
mfcalt='b', label='ISPY-LocalAttn') | ||
plt.plot(x_data, list(df_pre_issue.richa_heu), color='orangered', linestyle='-', marker='^', markersize=4, | ||
mfcalt='b', label='ISPY-Heu') | ||
plt.plot(x_data, list(df_pre_issue.richa_cnn), color='deepskyblue', linestyle='-', marker='.', mfc='w', | ||
markersize=4, mfcalt='b', label='ISPY-CNN') | ||
# plt.grid(axis='y', linestyle='-.') | ||
# plt.grid(axis='x', linestyle='-.') | ||
|
||
plt.ylabel('Issue-P', fontdict={'family': 'Times New Roman', 'size': 16}) | ||
plt.ylim([0, 1]) | ||
plt.yticks(fontproperties='Times New Roman', size=13) | ||
plt.xticks(fontproperties='Times New Roman', size=13) | ||
# print(stats.ttest_ind(df_pre_issue.richa_heu, df_pre_issue.richa_cnn)) | ||
|
||
plt.subplot(232) | ||
plt.plot(x_data, list(df_rec_issue.richa), color='limegreen', linestyle='-', marker='s', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_rec_issue.richa_localattn), color='darksalmon', linestyle='-', marker='x', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_rec_issue.richa_heu), color='orangered', linestyle='-', marker='^', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_rec_issue.richa_cnn), color='deepskyblue', linestyle='-', marker='.', mfc='w', | ||
markersize=4, mfcalt='b') | ||
plt.xticks([]) | ||
plt.yticks([]) | ||
|
||
|
||
# plt.grid(axis='y', linestyle='-.') | ||
# plt.grid(axis='x', linestyle='-.') | ||
|
||
plt.ylabel('Issue-R', fontdict={'family': 'Times New Roman', 'size': 16}) | ||
plt.ylim([0, 1]) | ||
plt.yticks(fontproperties='Times New Roman', size=13) | ||
plt.xticks(fontproperties='Times New Roman', size=13) | ||
# print(stats.ttest_ind(df_rec_issue.richa, df_rec_issue.richa_cnn)) | ||
# print(stats.ttest_ind(df_rec_issue.richa, df_rec_issue.richa_localattn)) | ||
# print(stats.ttest_ind(df_rec_issue.richa_heu, df_rec_issue.richa_cnn)) | ||
|
||
|
||
plt.subplot(233) | ||
plt.plot(x_data, list(df_f1_issue.richa), color='limegreen', linestyle='-', marker='s', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_f1_issue.richa_localattn), color='darksalmon', linestyle='-', marker='x', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_f1_issue.richa_heu), color='orangered', linestyle='-', marker='^', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_f1_issue.richa_cnn), color='deepskyblue', linestyle='-', marker='.', mfc='w', | ||
markersize=4, mfcalt='b') | ||
plt.xticks([]) | ||
plt.yticks([]) | ||
|
||
|
||
# plt.grid(axis='y', linestyle='-.') | ||
# plt.grid(axis='x', linestyle='-.') | ||
|
||
plt.ylabel('Issue-F1', fontdict={'family': 'Times New Roman', 'size': 16}) | ||
plt.ylim([0, 1]) | ||
plt.yticks(fontproperties='Times New Roman', size=13) | ||
plt.xticks(fontproperties='Times New Roman', size=13) | ||
print(stats.ttest_ind(df_f1_issue.richa, df_f1_issue.richa_cnn)) | ||
print(stats.ttest_ind(df_f1_issue.richa, df_f1_issue.richa_heu)) | ||
|
||
|
||
plt.subplot(234) | ||
plt.plot(x_data, list(df_pre_solution.richa), color='limegreen', linestyle='-', marker='s', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_pre_solution.richa_localattn), color='darksalmon', linestyle='-', marker='x', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_pre_solution.richa_heu), color='orangered', linestyle='-', marker='^', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_pre_solution.richa_cnn), color='deepskyblue', linestyle='-', marker='.', mfc='w', | ||
markersize=4, mfcalt='b') | ||
# plt.grid(axis='y', linestyle='-.') | ||
# plt.grid(axis='x', linestyle='-.') | ||
|
||
plt.ylabel('Solution-P', fontdict={'family': 'Times New Roman', 'size': 16}) | ||
plt.ylim([0, 1]) | ||
plt.yticks(fontproperties='Times New Roman', size=13) | ||
plt.xticks(fontproperties='Times New Roman', size=13) | ||
# print(stats.ttest_ind(df_pre_solution.richa_heu, df_pre_solution.richa_cnn)) | ||
|
||
|
||
plt.subplot(235) | ||
plt.plot(x_data, list(df_rec_solution.richa), color='limegreen', linestyle='-', marker='s', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_rec_solution.richa_localattn), color='darksalmon', linestyle='-', marker='x', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_rec_solution.richa_heu), color='orangered', linestyle='-', marker='^', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_rec_solution.richa_cnn), color='deepskyblue', linestyle='-', marker='.', mfc='w', | ||
markersize=4, mfcalt='b') | ||
# plt.grid(axis='y', linestyle='-.') | ||
# plt.grid(axis='x', linestyle='-.') | ||
plt.yticks([]) | ||
|
||
|
||
plt.ylabel('Solution-R', fontdict={'family': 'Times New Roman', 'size': 16}) | ||
plt.ylim([0, 1]) | ||
plt.yticks(fontproperties='Times New Roman', size=13) | ||
plt.xticks(fontproperties='Times New Roman', size=13) | ||
# print(stats.ttest_ind(df_rec_solution.richa_heu, df_rec_solution.richa_cnn)) | ||
|
||
plt.subplot(236) | ||
plt.plot(x_data, list(df_f1_solution.richa), color='limegreen', linestyle='-', marker='s', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_f1_solution.richa_localattn), color='darksalmon', linestyle='-', marker='x', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_f1_solution.richa_heu), color='orangered', linestyle='-', marker='^', markersize=4, | ||
mfcalt='b') | ||
plt.plot(x_data, list(df_f1_solution.richa_cnn), color='deepskyblue', linestyle='-', marker='.', mfc='w', | ||
markersize=4, mfcalt='b') | ||
# plt.grid(axis='y', linestyle='-.') | ||
# plt.grid(axis='x', linestyle='-.') | ||
plt.yticks([]) | ||
|
||
|
||
plt.ylabel('Solution-F1', fontdict={'family': 'Times New Roman', 'size': 16}) | ||
plt.ylim([0, 1]) | ||
plt.yticks(fontproperties='Times New Roman', size=13) | ||
plt.xticks(fontproperties='Times New Roman', size=13) | ||
print(stats.ttest_ind(df_f1_solution.richa, df_f1_solution.richa_cnn)) | ||
print(stats.ttest_ind(df_f1_solution.richa, df_f1_solution.richa_heu)) | ||
fig.legend(loc='upper center', ncol=4, prop={'size': 13, 'family': 'Times New Roman'}) | ||
plt.show() | ||
# print(df_pre) | ||
|
||
|
||
def t_return(): | ||
richa = [0.76, 0.77, 0.76, 0.75, 0.68, 0.71, 0.84, 0.74, 0.79, 0.77, 0.68, 0.72, 0.82, 0.73, 0.77, 0.80, 0.69, 0.74, 0.79, 0.70, 0.74, 0.86, 0.78, 0.82] | ||
nb = [0.36, 0.40, 0.38, 0.41, 0.30, 0.35, 0.47, 0.36, 0.41, 0.70, 0.56, 0.62, 0.08, 0.25, 0.13, 0.22, 0.42, 0.29, 0.30, 0.50, 0.37, 0.15, 0.40, 0.22] | ||
rf = [0.56, 0.25, 0.34, 0.69, 0.30, 0.42, 0.75, 0.23, 0.35, 0.84, 0.44, 0.58, 1.00, 0.17, 0.29, 0.50, 0.25, 0.33, 0.33, 0.13, 0.18, 0.23, 0.30, 0.26] | ||
gdbt = [0.27, 0.75, 0.40, 0.40, 0.70, 0.51, 0.50, 0.79, 0.61, 0.73, 0.44, 0.55, 0.21, 0.76, 0.33, 0.19, 0.67, 0.29, 0.30, 0.88, 0.44, 0.18, 0.90, 0.30] | ||
casper = [0.39, 0.35, 0.37, 0.08, 0.03, 0.05, 0.59, 0.26, 0.36, 0.46, 0.40, 0.43, 0.19, 0.42, 0.26, 0.14, 0.17, 0.15, 0.05, 0.06, 0.06, 0.15, 0.40, 0.22] | ||
cnc = [0.20, 0.55, 0.29, 0.23, 0.50, 0.32, 0.23, 0.36, 0.28, 0.12, 0.32, 0.17, 0.24, 0.42, 0.30, 0.12, 0.42, 0.19, 0.10, 0.50, 0.17, 0.05, 0.40, 0.10] | ||
deca = [0.33, 0.50, 0.40, 0.28, 0.37, 0.31, 0.33, 0.36, 0.34, 0.64, 0.28, 0.39, 0.42, 0.42, 0.42, 0.44, 0.67, 0.53, 0.32, 0.50, 0.39, 0.04, 0.10, 0.06] | ||
|
||
baselines = {'nb': nb, 'rf': rf, 'gdbt': gdbt, 'casper': casper, 'cnc': cnc, 'deca': deca} | ||
for baseline in baselines.keys(): | ||
data_temp = baselines[baseline] | ||
richa_pre = [ric_value for i, ric_value in enumerate(richa) if (i + 1) % 3 == 1] | ||
richa_rec = [ric_value for i, ric_value in enumerate(richa) if (i + 1) % 3 == 2] | ||
richa_f1 = [ric_value for i, ric_value in enumerate(richa) if (i + 1) % 3 == 0] | ||
|
||
base_pre = [base_value for i, base_value in enumerate(data_temp) if (i + 1) % 3 == 1] | ||
base_rec = [base_value for i, base_value in enumerate(data_temp) if (i + 1) % 3 == 2] | ||
base_f1 = [base_value for i, base_value in enumerate(data_temp) if (i + 1) % 3 == 0] | ||
data_t = stats.ttest_ind(richa_f1, base_f1) | ||
print(data_t) | ||
|
||
|
||
if __name__=='__main__': | ||
# t_return() | ||
draw_ablation() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
Copyright (c) 2018, Jonathan K Kummerfeld <[email protected]> | ||
|
||
Permission to use, copy, modify, and/or distribute this software for any | ||
purpose with or without fee is hereby granted, provided that the above | ||
copyright notice and this permission notice appear in all copies. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH | ||
REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND | ||
FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, | ||
INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM | ||
LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR | ||
OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR | ||
PERFORMANCE OF THIS SOFTWARE. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
# System | ||
|
||
This folder contains code for reproducing our disentanglement experiments. | ||
|
||
## Requirements | ||
|
||
The only dependency is the [DyNet library](http://dynet.readthedocs.io), which can usually be installed with: | ||
|
||
``` | ||
pip3 install dynet | ||
``` | ||
|
||
## Running | ||
|
||
To see all options, run: | ||
|
||
``` | ||
python3 disentangle.py --help | ||
``` | ||
|
||
### Train | ||
|
||
To train, provide the `--train` argument followed by a series of filenames. | ||
|
||
The example command below will train a model with the same parameters as used in the ACL paper. | ||
The model is a feedforward neural network with 2 layers, 512 dimensional hidden vectors, and softsign non-linearities. | ||
|
||
``` | ||
python3 disentangle.py \ | ||
example-train \ | ||
--train ../data/train/*annotation.txt \ | ||
--dev ../data/dev/*annotation.txt \ | ||
--hidden 512 \ | ||
--layers 2 \ | ||
--nonlin softsign \ | ||
--word-vectors ../data/glove-ubuntu.txt \ | ||
--epochs 20 \ | ||
--dynet-autobatch \ | ||
--drop 0 \ | ||
--learning-rate 0.018804 \ | ||
--learning-decay-rate 0.103 \ | ||
--seed 10 \ | ||
--clip 3.740 \ | ||
--weight-decay 1e-07 \ | ||
--opt sgd \ | ||
> example-train.out 2>example-train.err | ||
``` | ||
|
||
### Infer | ||
|
||
This command will run the model trained above on the development set: | ||
|
||
``` | ||
python3 disentangle.py \ | ||
angual_angular.1 \ | ||
--model example-train.dy.model \ | ||
--test /home/yuminz/gitter_chatmessage/angual_angular/*ascii* \ | ||
--test-start 0 \ | ||
--test-end 5000 \ | ||
--hidden 512 \ | ||
--layers 2 \ | ||
--nonlin softsign \ | ||
--word-vectors ../data/glove-ubuntu.txt \ | ||
> angual_angular.1.out 2>angual_angular.1.err | ||
``` | ||
|
||
Note - the arguments defining the network (hiiden, layers, nonlin), must match those given in training. | ||
|
||
### Evaluate | ||
|
||
This command will run the output produced by the command above through the evaluation script: | ||
|
||
``` | ||
python3 ../tools/evaluation/graph-eval.py --gold ../data/dev/*annotation* --auto example-run.1.out | ||
``` | ||
|
||
The output should be something like: | ||
|
||
``` | ||
g/a/m: 2607 2500 1855 | ||
p/r/f: 74.2 71.2 72.6 | ||
``` | ||
|
||
The first row is a count of the gold links, auto links, and matching links. | ||
The second line is the precision, recall, and F-score. | ||
|
||
Note - the values in the paper are an average over 10 runs, so they will differ slightly from what you get here. | ||
|
||
### Running on a file | ||
|
||
If you want to apply a model to a file, see this script for an example of how to do it: `example-running.sh`. | ||
The script is set up so someone could call it like so (once the necessary placeholders in the script are set): | ||
|
||
./disentangle-file.sh < sample.ascii.txt > sample.links.txt | ||
|
||
## Ensemble | ||
|
||
For the best results, we used a simple ensemble of multiple models. | ||
We trained 10 models as described above, but with different random seeds (1 through to 10). | ||
We combined their output using the `majority_vote.py` script in this directory. | ||
|
||
The same script is used for all three ensemble methods, with slightly different input and arguments: | ||
|
||
Union | ||
``` | ||
./majority_vote.py example-run*graphs --method union > example-run.combined.union | ||
``` | ||
|
||
Vote | ||
``` | ||
./majority_vote.py example-run*graphs --method vote > example-run.combined.vote | ||
``` | ||
|
||
Intersect | ||
``` | ||
./majority_vote.py example-run*clusters --method intersect > example-run.combined.intersect | ||
``` | ||
|
||
All of these assume the output files have been converted into our graph format. | ||
Assuming you run `disentangle.py` above and save the output of each run as `example-run.1.out`, `example-run.2.out`, `example-run.3.out`, etc, then this command will use one of our tools to convert them to the graph format: | ||
``` | ||
for name in example-run*out ; do ../tools/format-conversion/output-from-py-to-graph.py < $name > $name.graphs ; done | ||
``` | ||
|
||
The intersect method also assumes they have been made into clusters, like this: | ||
``` | ||
for name in example-run*out ; do ../tools/format-conversion/graph-to-cluster.py < $name.graphs > $name.clusters ; done | ||
``` | ||
|
||
Note: An earlier version of the steps above didn't account for a change in the output of the main system. Apologies for the broken output this would have caused. | ||
|
||
## C++ Model | ||
|
||
As well as the main Python code, we also wrote a model in C++ that was used for DSTC 7 and the results in the 2018 arXiv version of the paper (the Python version was used for DSTC 8 and the 2019 ACL paper). | ||
The python model has additional input features and a different text representation method. | ||
The C++ model has support for a range of additional variations in both inference and modeling, which did not appear to improve performance. | ||
For details on how to build and run the C++ code, see [this page](./old-cpp-version/). | ||
|
||
[Go back](./../) to the main webpage. |
Oops, something went wrong.