-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backstitch kaldi 52 #1605
Backstitch kaldi 52 #1605
Conversation
7672f53
to
d1d6b52
Compare
d1d6b52
to
4ecdf04
Compare
a9565a8
to
7a20331
Compare
@freewym, can you please fix conflicts? |
OK, will do it tomorrow. |
c231970
to
7a20331
Compare
7a20331
to
546ec30
Compare
@danpovey Do you prefer to add all the backstitch experiments scripts to this PR, or create a separate PR after merging this one? |
Probably a separate PR after merging this one.
…On Wed, May 31, 2017 at 3:09 AM, Yiming Wang ***@***.***> wrote:
@danpovey <https://github.com/danpovey> Do you prefer to add all the
backstitch experiments scripts to this PR, or create a separate PR after
merging this one?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1605 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu73qGlSEGEdMa4Vu3APYjXKNDRh0ks5r_RIsgaJpZM4NQI2o>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Various minor cosmetic and code-refactoring requests.
|
||
# ./local/chain/compare_wer_general.sh --looped exp/chain_cleaned/tdnn_lstm1e_sp_bi exp/chain_cleaned/tdnn_lstm1t_sp_bi | ||
# System tdnn_lstm1v_sp_bi tdnn_lstm1t_sp_bi | ||
# WER on dev(orig) 8.6 9.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We normally have the current experiment last-- this might confuse users a little.
@@ -307,6 +316,11 @@ def train_one_iteration(dir, iter, srand, egs_dir, | |||
xent_regularize=xent_regularize, | |||
leaky_hmm_coefficient=leaky_hmm_coefficient, | |||
momentum=momentum, | |||
# linearly increase backstitch_training_scale during the | |||
# first few iterations(hard-coded as 15 for now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove "for now" as we don't plan to change it. and space before (.
@@ -25,7 +25,9 @@ | |||
def train_new_models(dir, iter, srand, num_jobs, | |||
num_archives_processed, num_archives, | |||
raw_model_string, egs_dir, | |||
momentum, max_param_change, | |||
momentum, | |||
backstitch_training_scale, backstitch_training_interval, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make other parts of the code more robust to the change, if you were to put the new parameters at the end and give them default values, e.g. 0.0 and 1. No need to put them in the exact same order when you call this function, though.
src/nnet3/nnet-chain-training.cc
Outdated
|
||
void NnetChainTrainer::ProcessOutputs(const NnetChainExample &eg, | ||
void NnetChainTrainer::ProcessOutputs(bool is_backstitch_step, | ||
const NnetChainExample &eg, | ||
NnetComputer *computer) { | ||
// normally the eg will have just one output named 'output', but | ||
// we don't assume this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should probably clarify in this comment that in the diagnostics,
the output-name with the "_backstitch" suffix is the one
computed after the first, backward step of backstitch.
src/nnet3/nnet-chain-training.cc
Outdated
// backstitch training is incompatible with momentum > 0 | ||
KALDI_ASSERT(nnet_config.momentum == 0.0); | ||
FreezeNaturalGradient(true, delta_nnet_); | ||
bool is_backstitch_step = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name 'is_backstitch_step' is not very clear as backstitch has two steps.
You should probably say 'is_backstitch_step1'.
src/nnet3/nnet-training.h
Outdated
@@ -70,6 +74,13 @@ struct NnetTrainerOptions { | |||
"so that the 'effective' learning rate is the same as " | |||
"before (because momentum would normally increase the " | |||
"effective learning rate by 1/(1-momentum))"); | |||
opts->Register("backstitch-training-scale", &backstitch_training_scale, | |||
"backstitch traning factor. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo traning; and mention that is alpha in our publications.
src/nnet3/nnet-training.h
Outdated
opts->Register("backstitch-training-interval", | ||
&backstitch_training_interval, | ||
"do backstitch training with the specified interval of " | ||
"minibatches."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mention that this is the interval n in publications.
src/nnet3/nnet-utils.h
Outdated
/** | ||
This function does the operation '*nnet += scale * delta_nnet', while | ||
respecting any max-parameter-change (max-param-change) specified in the | ||
updatable components, plus the global max-param-change specified as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plus -> and also
src/nnet3/nnet-utils.h
Outdated
@param [in] scale This value, which will normally be 1.0, is a scaling | ||
factor used when adding to 'nnet', applied after any max-changes. | ||
@param [in,out] nnet The nnet which we add to. | ||
@param [in,out] num_max_change_per_component_applied Stats for per-component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify that we add to the elements of this (if that is what this function does).
Also I would write this as just [out].
src/nnet3/nnet-utils.h
Outdated
@param [in,out] nnet The nnet which we add to. | ||
@param [in,out] num_max_change_per_component_applied Stats for per-component | ||
max-change. | ||
@param [in,out] num_max_change_global_applied Stats for global max-change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify that we add to this. And I'd write this as just [out]. I might be wrong in doing that, but other documentation uses this convention.
@freewym, there is something else, and @GaofengCheng, pay attention to this, as it may affect your experiments with dropout and backstitch. |
srand_seed_ is initialized with a random integer in the constructor. which are almost different across all calling of the training binary. So it might be suffice to solve the issue for 'rare-size' egs by changing the condition to what you proposed? |
No that's not sufficient because the output of RandInt() is determined by
rand(), which is deterministic unless you set srand() to a different value
at the start of the program.
…On Wed, May 31, 2017 at 8:45 PM, Yiming Wang ***@***.***> wrote:
srand_seed_ is initialized with a random integer in the constructor. which
are almost different across all calling of the training binary. So it might
be suffice to solve the issue for 'rare-size' egs by changing the condition
to what you proposed?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1605 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVuwr32ScGdGI7Q9fmHSkshizdzoOOks5r_gm3gaJpZM4NQI2o>
.
|
Oh yes. |
@danpovey I can try backstitch+dropout after Yiming fixing as you recommended and see |
guys, does this have a chance to be merged soon (today) or does it still
need some more time?
y.
…On Wed, May 31, 2017 at 9:43 PM, Gaofeng Cheng ***@***.***> wrote:
@danpovey <https://github.com/danpovey> I can try backstitch+dropout
after Yiming fixing as you recommended and see
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1605 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AKisX2UB2uqx5H7nYiSY8sRDU6-RjNgAks5r_hdbgaJpZM4NQI2o>
.
|
Probably more like 2-3 days, it depends how busy Yiming is-- I asked for
some small code refactorings and it will require testing just to make sure
the rewritten code doesn't crash.
…On Thu, Jun 1, 2017 at 10:34 AM, jtrmal ***@***.***> wrote:
guys, does this have a chance to be merged soon (today) or does it still
need some more time?
y.
On Wed, May 31, 2017 at 9:43 PM, Gaofeng Cheng ***@***.***>
wrote:
> @danpovey <https://github.com/danpovey> I can try backstitch+dropout
> after Yiming fixing as you recommended and see
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#1605 (comment)>,
or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/
AKisX2UB2uqx5H7nYiSY8sRDU6-RjNgAks5r_hdbgaJpZM4NQI2o>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1605 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu_rmbsi3wNSNLoQL6QDh1dydnn6Aks5r_svmgaJpZM4NQI2o>
.
|
d3bbda2
to
320bfbb
Compare
NnetComputer computer(nnet_config.compute_config, *computation, | ||
if (nnet_config.backstitch_training_scale > 0.0 && num_minibatches_processed_ | ||
% nnet_config.backstitch_training_interval == | ||
srand_seed_ % nnet_config.backstitch_training_interval) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the new refactored code usng chain model (tdnn-lstm system on AMI, with backstitch enabled or disabled respectively, if enabled, test interval=1 or 5 respectively), and xent models (using the CIFAR recipe, with backstitch enabled or disabled, if enabled, only test interval=1), and compared their results with my old runs. Among those results, the only one that cannot match the old result is the chain+tdnn-lstm AMI system with backstitch interval=5 setting, where WER is ~0.8 worse than the old run, while the valid log-prob is also a bit lower. So now I am changing this expression to
nnet_config.backstitch_training_interval == 0
to see if I can reproduce the old result.
It could be this is about which minibatches are chosen for the natural
gradient update, but I'd only expect to see this effect with
backstitch-interval=4, since it's the same as the update-period for the
natural gradient.
…On Tue, Jun 6, 2017 at 3:05 AM, Yiming Wang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/nnet3/nnet-chain-training.cc
<#1605 (comment)>:
> @@ -68,23 +69,94 @@ void NnetChainTrainer::Train(const NnetChainExample &chain_eg) {
&request);
const NnetComputation *computation = compiler_.Compile(request);
- NnetComputer computer(nnet_config.compute_config, *computation,
+ if (nnet_config.backstitch_training_scale > 0.0 && num_minibatches_processed_
+ % nnet_config.backstitch_training_interval ==
+ srand_seed_ % nnet_config.backstitch_training_interval) {
I tested the new refactored code usng chain model (tdnn-lstm system on
AMI, with backstitch enabled or disabled respectively, if enabled, test
interval=1 or 5 respectively), and xent models (using the CIFAR recipe,
with backstitch enabled or disabled, if enabled, only test interval=1), and
compared their results with my old runs. Among those results, the only one
that cannot match the old result is the chain+tdnn-lstm AMI system with
backstitch interval=5 setting, where WER is ~0.8 worse than the old run,
while the valid log-prob is also a bit lower. So now I am changing this
expression to
nnet_config.backstitch_training_interval == 0
to see if I can reproduce the old result.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1605 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu1GuNabfTwxNrQD6ZUC6Lm4TFbjvks5sBPpWgaJpZM4NQI2o>
.
|
…ecting the results
* 'master' of https://github.com/kaldi-asr/kaldi: (140 commits) [egs] Fix failure in multilingual BABEL recipe (regenerate cmvn.scp) (kaldi-asr#1686) [src,scripts,egs] Backstitch code+scripts, and one experiment, will add more later. (kaldi-asr#1605) [egs] CNN+TDNN+LSTM experiments on AMI (kaldi-asr#1685) [egs,scripts,src] Tune image recognition examples; minor small changes. (kaldi-asr#1682) [src] Fix bug in looped computation (kaldi-asr#1673) [build] when installing sequitur and mmseg, look for lib64 as well (thanks: @akshayc11) (kaldi-asr#1677) [src] fix to gst-plugin/Makefile (remove -lkaldi-thread) (kaldi-asr#1680) [src] Cosmetic fixes to usage messages [egs] Fix to some --proportional-shrink related example scripts (kaldi-asr#1674) [build] Fix small bug in configure [scripts] Fix small bug in utils/gen_topo.pl. [scripts] Add python script to convert nnet2 to nnet3 models (kaldi-asr#1611) [doc] Fix typo (kaldi-asr#1669) [src] nnet3: fix small bug in checking code. Thanks: @Maddin2000. [src] Add #include missing from previous commit [src] Fix bug in online2-nnet3 decoding RE dropout+batch-norm (thanks: Wonkyum Lee) [scripts] make errors getting report non-fatal (thx: Miguel Jette); add comment RE dropout proportion [src,scripts] Use ConstFst or decoding (half the memory; slightly faster). (kaldi-asr#1661) [src] keyword search tools: fix Minimize() call, necessary due to OpenFst upgrade (kaldi-asr#1663) [scripts] do not fail if the ivector extractor belongs to different user (kaldi-asr#1662) ...
…dd more later. (kaldi-asr#1605) See http://www.danielpovey.com/files/2017_nips_backstitch.pdf for details.
Continued work from #1511