Backstitch kaldi 52 #1605

freewym · 2017-05-04T01:31:55Z

Continued work from #1511

danpovey · 2017-05-30T03:41:20Z

@freewym, can you please fix conflicts?
I'm considering merging this version soon, since it looks like doing it at the egs level would be more complicated than I thought, due to the interaction with natural gradient.

freewym · 2017-05-30T06:06:39Z

OK, will do it tomorrow.

freewym · 2017-05-31T07:09:30Z

@danpovey Do you prefer to add all the backstitch experiments scripts to this PR, or create a separate PR after merging this one?

danpovey · 2017-05-31T18:28:06Z

Probably a separate PR after merging this one.

…

On Wed, May 31, 2017 at 3:09 AM, Yiming Wang ***@***.***> wrote: @danpovey <https://github.com/danpovey> Do you prefer to add all the backstitch experiments scripts to this PR, or create a separate PR after merging this one? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1605 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu73qGlSEGEdMa4Vu3APYjXKNDRh0ks5r_RIsgaJpZM4NQI2o> .

danpovey

Various minor cosmetic and code-refactoring requests.

danpovey · 2017-05-31T21:39:50Z

egs/tedlium/s5_r2/local/chain/tuning/run_tdnn_lstm_1v.sh

+
+# ./local/chain/compare_wer_general.sh --looped exp/chain_cleaned/tdnn_lstm1e_sp_bi exp/chain_cleaned/tdnn_lstm1t_sp_bi
+# System                tdnn_lstm1v_sp_bi tdnn_lstm1t_sp_bi
+# WER on dev(orig)            8.6       9.0


We normally have the current experiment last-- this might confuse users a little.

danpovey · 2017-05-31T21:53:37Z

egs/wsj/s5/steps/libs/nnet3/train/chain_objf/acoustic_model.py

@@ -307,6 +316,11 @@ def train_one_iteration(dir, iter, srand, egs_dir,
                     xent_regularize=xent_regularize,
                     leaky_hmm_coefficient=leaky_hmm_coefficient,
                     momentum=momentum,
+                     # linearly increase backstitch_training_scale during the
+                     # first few iterations(hard-coded as 15 for now)


remove "for now" as we don't plan to change it. and space before (.

danpovey · 2017-05-31T21:55:03Z

egs/wsj/s5/steps/libs/nnet3/train/frame_level_objf/common.py

@@ -25,7 +25,9 @@
 def train_new_models(dir, iter, srand, num_jobs,
                     num_archives_processed, num_archives,
                     raw_model_string, egs_dir,
-                     momentum, max_param_change,
+                     momentum,
+                     backstitch_training_scale, backstitch_training_interval,


I think it would make other parts of the code more robust to the change, if you were to put the new parameters at the end and give them default values, e.g. 0.0 and 1. No need to put them in the exact same order when you call this function, though.

danpovey · 2017-05-31T22:05:15Z

src/nnet3/nnet-chain-training.cc

-
-void NnetChainTrainer::ProcessOutputs(const NnetChainExample &eg,
+void NnetChainTrainer::ProcessOutputs(bool is_backstitch_step,
+                                      const NnetChainExample &eg,
                                      NnetComputer *computer) {
  // normally the eg will have just one output named 'output', but
  // we don't assume this.


you should probably clarify in this comment that in the diagnostics,
the output-name with the "_backstitch" suffix is the one
computed after the first, backward step of backstitch.

danpovey · 2017-05-31T22:06:13Z

src/nnet3/nnet-chain-training.cc

+    // backstitch training is incompatible with momentum > 0
+    KALDI_ASSERT(nnet_config.momentum == 0.0);
+    FreezeNaturalGradient(true, delta_nnet_);
+    bool is_backstitch_step = true;


The name 'is_backstitch_step' is not very clear as backstitch has two steps.
You should probably say 'is_backstitch_step1'.

danpovey · 2017-05-31T22:19:52Z

src/nnet3/nnet-training.h

@@ -70,6 +74,13 @@ struct NnetTrainerOptions {
                   "so that the 'effective' learning rate is the same as "
                   "before (because momentum would normally increase the "
                   "effective learning rate by 1/(1-momentum))");
+    opts->Register("backstitch-training-scale", &backstitch_training_scale,
+                   "backstitch traning factor. "


typo traning; and mention that is alpha in our publications.

danpovey · 2017-05-31T22:20:26Z

src/nnet3/nnet-training.h

+    opts->Register("backstitch-training-interval",
+                   &backstitch_training_interval,
+                   "do backstitch training with the specified interval of "
+                   "minibatches.");


mention that this is the interval n in publications.

danpovey · 2017-05-31T22:21:26Z

src/nnet3/nnet-utils.h

+/**
+   This function does the operation '*nnet += scale * delta_nnet', while
+   respecting any max-parameter-change (max-param-change) specified in the
+   updatable components, plus the global max-param-change specified as


plus -> and also

danpovey · 2017-05-31T22:22:56Z

src/nnet3/nnet-utils.h

+   @param [in] scale  This value, which will normally be 1.0, is a scaling
+               factor used when adding to 'nnet', applied after any max-changes.
+   @param [in,out] nnet  The nnet which we add to.
+   @param [in,out] num_max_change_per_component_applied  Stats for per-component


Clarify that we add to the elements of this (if that is what this function does).
Also I would write this as just [out].

danpovey · 2017-05-31T22:24:02Z

src/nnet3/nnet-utils.h

+   @param [in,out] nnet  The nnet which we add to.
+   @param [in,out] num_max_change_per_component_applied  Stats for per-component
+                   max-change.
+   @param [in,out] num_max_change_global_applied  Stats for global max-change.


Clarify that we add to this. And I'd write this as just [out]. I might be wrong in doing that, but other documentation uses this convention.

danpovey · 2017-05-31T23:01:30Z

@freewym, there is something else, and @GaofengCheng, pay attention to this, as it may affect your experiments with dropout and backstitch.
I am concerned that it may be insufficient to rely on the egs-shuffling to ensure that on different epochs, different data gets backstitch applied to it. The issue is that if there are 'rare-sized' egs, they won't get shuffled enough-- we use a buffer size of 2000, but rare sizes of egs may get spit out in almost the same order as they came in, if we consider egs of only one type.
So instead of a condition like
num_minibatches_processed_ % interval == 0
you should use a condition like
num_minibatches_processed_ % interval == srand_seed_ % interval.
But that means that you will have to start setting the srand-seed via the --srand option of nnet3-train and nnet3-chain-train (add the option to the scripts if needed; set it to the same value as is provided to nnet3-shuffle-egs).
This may have a small impact on experiments with dropout (I doubt it, but it could happen).

freewym · 2017-06-01T00:45:41Z

srand_seed_ is initialized with a random integer in the constructor. which are almost different across all calling of the training binary. So it might be suffice to solve the issue for 'rare-size' egs by changing the condition to what you proposed?

danpovey · 2017-06-01T00:46:56Z

No that's not sufficient because the output of RandInt() is determined by rand(), which is deterministic unless you set srand() to a different value at the start of the program.

…

On Wed, May 31, 2017 at 8:45 PM, Yiming Wang ***@***.***> wrote: srand_seed_ is initialized with a random integer in the constructor. which are almost different across all calling of the training binary. So it might be suffice to solve the issue for 'rare-size' egs by changing the condition to what you proposed? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1605 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVuwr32ScGdGI7Q9fmHSkshizdzoOOks5r_gm3gaJpZM4NQI2o> .

freewym · 2017-06-01T00:58:59Z

Oh yes.

GaofengCheng · 2017-06-01T01:43:50Z

@danpovey I can try backstitch+dropout after Yiming fixing as you recommended and see

jtrmal · 2017-06-01T14:34:09Z

guys, does this have a chance to be merged soon (today) or does it still need some more time? y.

…

On Wed, May 31, 2017 at 9:43 PM, Gaofeng Cheng ***@***.***> wrote: @danpovey <https://github.com/danpovey> I can try backstitch+dropout after Yiming fixing as you recommended and see — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1605 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKisX2UB2uqx5H7nYiSY8sRDU6-RjNgAks5r_hdbgaJpZM4NQI2o> .

danpovey · 2017-06-01T17:17:23Z

Probably more like 2-3 days, it depends how busy Yiming is-- I asked for some small code refactorings and it will require testing just to make sure the rewritten code doesn't crash.

…

On Thu, Jun 1, 2017 at 10:34 AM, jtrmal ***@***.***> wrote: guys, does this have a chance to be merged soon (today) or does it still need some more time? y. On Wed, May 31, 2017 at 9:43 PM, Gaofeng Cheng ***@***.***> wrote: > @danpovey <https://github.com/danpovey> I can try backstitch+dropout > after Yiming fixing as you recommended and see > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#1605 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/ AKisX2UB2uqx5H7nYiSY8sRDU6-RjNgAks5r_hdbgaJpZM4NQI2o> > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1605 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu_rmbsi3wNSNLoQL6QDh1dydnn6Aks5r_svmgaJpZM4NQI2o> .

freewym · 2017-06-06T07:05:56Z

src/nnet3/nnet-chain-training.cc

-  NnetComputer computer(nnet_config.compute_config, *computation,
+  if (nnet_config.backstitch_training_scale > 0.0 && num_minibatches_processed_
+      % nnet_config.backstitch_training_interval ==
+      srand_seed_ % nnet_config.backstitch_training_interval) {


I tested the new refactored code usng chain model (tdnn-lstm system on AMI, with backstitch enabled or disabled respectively, if enabled, test interval=1 or 5 respectively), and xent models (using the CIFAR recipe, with backstitch enabled or disabled, if enabled, only test interval=1), and compared their results with my old runs. Among those results, the only one that cannot match the old result is the chain+tdnn-lstm AMI system with backstitch interval=5 setting, where WER is ~0.8 worse than the old run, while the valid log-prob is also a bit lower. So now I am changing this expression to
nnet_config.backstitch_training_interval == 0
to see if I can reproduce the old result.

danpovey · 2017-06-06T20:25:12Z

It could be this is about which minibatches are chosen for the natural gradient update, but I'd only expect to see this effect with backstitch-interval=4, since it's the same as the update-period for the natural gradient.

…

On Tue, Jun 6, 2017 at 3:05 AM, Yiming Wang ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/nnet3/nnet-chain-training.cc <#1605 (comment)>: > @@ -68,23 +69,94 @@ void NnetChainTrainer::Train(const NnetChainExample &chain_eg) { &request); const NnetComputation *computation = compiler_.Compile(request); - NnetComputer computer(nnet_config.compute_config, *computation, + if (nnet_config.backstitch_training_scale > 0.0 && num_minibatches_processed_ + % nnet_config.backstitch_training_interval == + srand_seed_ % nnet_config.backstitch_training_interval) { I tested the new refactored code usng chain model (tdnn-lstm system on AMI, with backstitch enabled or disabled respectively, if enabled, test interval=1 or 5 respectively), and xent models (using the CIFAR recipe, with backstitch enabled or disabled, if enabled, only test interval=1), and compared their results with my old runs. Among those results, the only one that cannot match the old result is the chain+tdnn-lstm AMI system with backstitch interval=5 setting, where WER is ~0.8 worse than the old run, while the valid log-prob is also a bit lower. So now I am changing this expression to nnet_config.backstitch_training_interval == 0 to see if I can reproduce the old result. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1605 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu1GuNabfTwxNrQD6ZUC6Lm4TFbjvks5sBPpWgaJpZM4NQI2o> .

…ecting the results

@akshayc11

* 'master' of https://github.com/kaldi-asr/kaldi: (140 commits) [egs] Fix failure in multilingual BABEL recipe (regenerate cmvn.scp) (kaldi-asr#1686) [src,scripts,egs] Backstitch code+scripts, and one experiment, will add more later. (kaldi-asr#1605) [egs] CNN+TDNN+LSTM experiments on AMI (kaldi-asr#1685) [egs,scripts,src] Tune image recognition examples; minor small changes. (kaldi-asr#1682) [src] Fix bug in looped computation (kaldi-asr#1673) [build] when installing sequitur and mmseg, look for lib64 as well (thanks: @akshayc11) (kaldi-asr#1677) [src] fix to gst-plugin/Makefile (remove -lkaldi-thread) (kaldi-asr#1680) [src] Cosmetic fixes to usage messages [egs] Fix to some --proportional-shrink related example scripts (kaldi-asr#1674) [build] Fix small bug in configure [scripts] Fix small bug in utils/gen_topo.pl. [scripts] Add python script to convert nnet2 to nnet3 models (kaldi-asr#1611) [doc] Fix typo (kaldi-asr#1669) [src] nnet3: fix small bug in checking code. Thanks: @Maddin2000. [src] Add #include missing from previous commit [src] Fix bug in online2-nnet3 decoding RE dropout+batch-norm (thanks: Wonkyum Lee) [scripts] make errors getting report non-fatal (thx: Miguel Jette); add comment RE dropout proportion [src,scripts] Use ConstFst or decoding (half the memory; slightly faster). (kaldi-asr#1661) [src] keyword search tools: fix Minimize() call, necessary due to OpenFst upgrade (kaldi-asr#1663) [scripts] do not fail if the ivector extractor belongs to different user (kaldi-asr#1662) ...

…dd more later. (kaldi-asr#1605) See http://www.danielpovey.com/files/2017_nips_backstitch.pdf for details.

freewym force-pushed the backstitch-kaldi_52 branch 4 times, most recently from 7672f53 to d1d6b52 Compare May 8, 2017 05:53

freewym force-pushed the backstitch-kaldi_52 branch from d1d6b52 to 4ecdf04 Compare May 13, 2017 04:53

freewym force-pushed the backstitch-kaldi_52 branch 2 times, most recently from a9565a8 to 7a20331 Compare May 23, 2017 18:08

danpovey changed the base branch from kaldi_52 to master May 30, 2017 03:40

freewym force-pushed the backstitch-kaldi_52 branch 2 times, most recently from c231970 to 7a20331 Compare May 31, 2017 06:13

freewym added 3 commits May 31, 2017 02:21

adversarial training

57ed551

added an example backstitch training script for tedlium chain model

7746718

added backstitch options to train_raw_dnn.py

546ec30

freewym force-pushed the backstitch-kaldi_52 branch from 7a20331 to 546ec30 Compare May 31, 2017 07:02

danpovey reviewed May 31, 2017

View reviewed changes

changes (to be tested)

320bfbb

freewym force-pushed the backstitch-kaldi_52 branch from d3bbda2 to 320bfbb Compare June 1, 2017 21:11

freewym commented Jun 6, 2017

View reviewed changes

fixed a bug on max-change stats when backstitch interval > 1, not aff…

5dbf1fd

…ecting the results

danpovey merged commit ecc6a78 into kaldi-asr:master Jun 14, 2017

freewym deleted the backstitch-kaldi_52 branch October 11, 2017 18:06

Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018

[src,scripts,egs] Backstitch code+scripts, and one experiment, will a…

4162916

…dd more later. (kaldi-asr#1605) See http://www.danielpovey.com/files/2017_nips_backstitch.pdf for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backstitch kaldi 52 #1605

Backstitch kaldi 52 #1605

freewym commented May 4, 2017 •

edited

Loading

danpovey commented May 30, 2017

freewym commented May 30, 2017

freewym commented May 31, 2017

danpovey commented May 31, 2017 via email

danpovey left a comment

danpovey May 31, 2017

danpovey May 31, 2017

danpovey May 31, 2017

danpovey May 31, 2017

danpovey May 31, 2017

danpovey May 31, 2017

danpovey May 31, 2017

danpovey May 31, 2017

danpovey May 31, 2017

danpovey May 31, 2017

danpovey commented May 31, 2017

freewym commented Jun 1, 2017

danpovey commented Jun 1, 2017 via email

freewym commented Jun 1, 2017

GaofengCheng commented Jun 1, 2017

jtrmal commented Jun 1, 2017 via email

danpovey commented Jun 1, 2017 via email

freewym Jun 6, 2017

danpovey commented Jun 6, 2017 via email

Backstitch kaldi 52 #1605

Backstitch kaldi 52 #1605

Conversation

freewym commented May 4, 2017 • edited Loading

danpovey commented May 30, 2017

freewym commented May 30, 2017

freewym commented May 31, 2017

danpovey commented May 31, 2017 via email

danpovey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey commented May 31, 2017

freewym commented Jun 1, 2017

danpovey commented Jun 1, 2017 via email

freewym commented Jun 1, 2017

GaofengCheng commented Jun 1, 2017

jtrmal commented Jun 1, 2017 via email

danpovey commented Jun 1, 2017 via email

Choose a reason for hiding this comment

danpovey commented Jun 6, 2017 via email

freewym commented May 4, 2017 •

edited

Loading