WIP: Epsilon Backstitch #1511

freewym · 2017-03-22T23:17:44Z

For discussion. Modified on top of #1391

danpovey · 2017-03-22T23:47:39Z

egs/wsj/s5/steps/libs/nnet3/train/chain_objf/acoustic_model.py

@@ -195,7 +201,10 @@ def train_new_models(dir, iter, srand, num_jobs,
                        fr_shft=frame_shift, l2=l2_regularize,
                        xent_reg=xent_regularize, leaky=leaky_hmm_coefficient,
                        parallel_train_opts=run_opts.parallel_train_opts,
-                        momentum=momentum, max_param_change=max_param_change,
+                        momentum=momentum,
+                        backstitch_training_scale=backstitch_training_scale,


See if it's feasible to just pass around a string 'backstitch_opts_str'. These functions have too many args.
[not urgent, you can do the experiments first; we may abandon the epsilon stuff if there turns out to be no advantage in it.]

danpovey · 2017-03-22T23:48:54Z

egs/wsj/s5/steps/libs/nnet3/train/common.py

+                                 that backstitch training is applied on.""")
+        self.parser.add_argument("--trainer.optimization.backstitch-training-epsilon",
+                                 type=float, dest='backstitch_training_epsilon',
+                                 default=0.0, help="""epsilon of backstitch


Eventually, we should probably make this default to something like 0.1 or 0.05 (and have it be the scale that determines whether or not backstitch is used at all). But we'll have to do some experiments first, to see whether the epsilon version is even better, and what the value should be.

danpovey · 2017-03-22T23:50:49Z

src/nnet3/nnet-chain-training.cc

@@ -68,23 +68,42 @@ void NnetChainTrainer::Train(const NnetChainExample &chain_eg) {
                             &request);
  const NnetComputation *computation = compiler_.Compile(request);

-  NnetComputer computer(nnet_config.compute_config, *computation,
+  if (nnet_config.backstitch_training_scale > 0.0 &&
+      nnet_config.backstitch_training_epsilon > 0.0 && num_minibatches_processed_


no need to check that the epsilon is > 0.0 for this purpose. Let's just assert that it's always >0.0. (change the default to a nonzero value).

danpovey · 2017-03-23T00:01:57Z

src/nnet3/nnet-chain-training.cc

@@ -167,6 +188,14 @@ void NnetChainTrainer::UpdateParamsWithMaxChange() {
                  << "UpdatableComponent; change this code.";
      BaseFloat max_param_change_per_comp = uc->MaxChange();
      KALDI_ASSERT(max_param_change_per_comp >= 0.0);
+      if (nnet_config.backstitch_training_scale > 0.0 &&


You need to do a much better job, via comments, of explaining what is going on here. Even I don't understand this. It's possible that the code structure here is the problem, and it will have to be changed so that it's easier to understand.

Here is my proposal.

We take UpdateParamsWithMaxChange out of the class (probably declare it in nnet-utils.h), remove the is_backstitch_step argument, take out the ScaleNnet() from it [that can be done separately by the
calling code], and give it this interface:

/** This function does the operation '*nnet += scale * delta_nnet', while respecting any max-parameter-change (max-param-change) specified in the updatable components, plus the global max-param-change specified as 'max_param_change'. With max-changes taken into account, the operation of this function is equivalent to the following, although it's done more efficiently: \code Nnet temp_nnet(delta_nnet); ScaleNnet(1.0 / max_change_scale, &temp_nnet); [ Scale down parameters for each component of temp_nnet as needed so their Euclidean norms do not exceed their per-component max-changes ] [ Scale down temp_nnet as needed so its Euclidean norm does not exceed the global max-change ] ScaleNnet(max_change_scale, &temp_nnet); // undo the previous scaling. AddNnet(temp_nnet, scale, nnet); \endcode @param [in] delta_nnet The copy of '*nnet' neural network that contains the proposed change in parameters. Normally this will previously have been set to: (delta_nnet = parameter-derivative-on-current-minibatch * learning-rate per parameter), with any natural gradient applied as specified in the components; but this may be different if momentum or backstitch are used. @param [in] max_param_change The global max-param-change specified on the command line (e.g. 2.0), which specifies the largest change allowed to '*nnet' in Euclidean norm. If <= 0, no global max-param-change will be enforced, but any max-change values specified in the components will still be enforced; see UpdatableComponent::MaxChange(), and search for 'max-change' in the configs or nnet3-info output). @param [in] max_change_scale This value, which will normally be 1.0, is used to scale all per-component max-change values and the global 'max_param_change', before applying them (so we use 'max_param_change * uc->MaxChange()' as the per-component max-change, and 'max_param_change * max_change_scale' as the global max-change). @param [in] scale This value, which will normally be 1.0, is a scaling factor used when adding to 'nnet', applied after any max-changes. @param [in,out] nnet The nnet which we add to. */ void UpdateNnetWithMaxChange(const Nnet &delta_nnet, BaseFloat max_param_change, BaseFloat max_change_scale, BaseFloat scale, Nnet *nnet);

You may have to have the backstitch code in a separate code branch from the regular/momentum
training code, to keep the simple case simple.

danpovey · 2017-03-23T00:10:47Z

src/nnet3/nnet-chain-training.cc

+                      nnet_);
+    scale_backstitch = (is_backstitch_step ? (-backstitch_ratio + 1 +
+        nnet_config.backstitch_training_epsilon) / backstitch_ratio :
+        nnet_config.momentum);


This code wouldn't work with momentum (you'd probably need a separate copy of the nnet for that), so better to use 0.0 instead of nnet_config.momentum, and assert that momentum is zero. After restructuring there may be 2 paths for momentum vs backstitch, anyway.

I can't figure out whether this code is correct. My rule when I can't easily figure out correctness is to ask for the code to be restructured, because if present me can't figure it out, then future me definitely won't be able to, not to mention present or future anyone-else.

freewym · 2017-04-01T21:45:48Z

The reorganized code (alpha=alpha_old+alpha_old^2=0.24, epsilon=0.2) gave the same result as the old backstitch code (alpha_old=0.2).

danpovey · 2017-04-01T21:46:59Z

ok, good; try with smaller epsilon, eg. 0.1 or less, if you haven't already. [but don't waste CPUs: if it seems to be progressing the same as the old code for the same parameters, using old runs might be OK.]

…

On Sat, Apr 1, 2017 at 5:45 PM, Yiming Wang ***@***.***> wrote: The reorganized code (alpha=alpha_old+alpha_old^2=0.24, epsilon=0.2) gave the same result as the old backstitch code (alpha_old=0.2). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1511 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu-gdleBe0VBV__47VKEpMEiMoBjAks5rrsWOgaJpZM4Ml8eg> .

freewym · 2017-04-26T17:08:33Z

@danpovey the latest backstitch code has been tested on the chain model.

jtrmal · 2017-04-26T17:10:49Z

btw, I'm testing it as well, actually with combination with dropout. Just FYI y.

…

On Wed, Apr 26, 2017 at 1:08 PM, Yiming Wang ***@***.***> wrote: @danpovey <https://github.com/danpovey> the latest backstitch code has been tested on the chain model. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1511 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKisX1aiARABIiN4yCzuMr7jQE3qlJ9Dks5rz3oXgaJpZM4Ml8eg> .

danpovey · 2017-04-26T18:09:10Z

Yiming, have you modified the code with the random-seed stuff so that it's fully compatible with dropout? [Yenda, unless we take care to make the random seed the same in the 1st and second passes of it, it may not give as much improvement as it could.] Yiming, please get on that right away if you haven't already done it. Dan

…

On Wed, Apr 26, 2017 at 1:10 PM, jtrmal ***@***.***> wrote: btw, I'm testing it as well, actually with combination with dropout. Just FYI y. On Wed, Apr 26, 2017 at 1:08 PM, Yiming Wang ***@***.***> wrote: > @danpovey <https://github.com/danpovey> the latest backstitch code has > been tested on the chain model. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#1511 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/ AKisX1aiARABIiN4yCzuMr7jQE3qlJ9Dks5rz3oXgaJpZM4Ml8eg> > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1511 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu2z-Ox_Tb3ywCpwEtZPFMb3dGipEks5rz3qcgaJpZM4Ml8eg> .

freewym · 2017-04-26T18:13:26Z

Yes, I have done that in the commit "resets random generator's seed for RandomComponent..."

danpovey · 2017-04-26T19:26:35Z

src/nnet3/nnet-chain-training.cc

+void NnetChainTrainer::TrainInternal(const NnetChainExample &eg,
+                                     const NnetComputation &computation,
+                                     bool is_backstitch_step) {
+  srand(srand_seed_ + num_minibatches_processed_);


You shouldn't do this if you're not doing backstitch, because it's not free of cost.
So only execute these two statements if you're doing backstitch; calling it outside of (and just before) TrainInternal will make more sense.

danpovey · 2017-04-26T19:29:38Z

src/nnet3/nnet-training.h

@@ -158,13 +169,12 @@ class NnetTrainer {

  ~NnetTrainer();
 private:
-  void ProcessOutputs(const NnetExample &eg,
-                      NnetComputer *computer);
+  void TrainInternal(const NnetExample &eg,


we'll need documentation for this function.

danpovey · 2017-05-03T23:24:40Z

I just noticed that this PR is to master, and it isn't even up to date with master. Can you please first try to merge with master, and then try to merge with the kaldi_52 branch? We'll be wanting to merge this with kaldi_52. You will likely have conflicts.
Obviously save this branch, as bad things can happen when merges fail.

…refactoring

…last review

…to ensure consistency, e.g., dropout mask, in backstitch training

…ainInternal()

stale · 2020-06-19T07:36:07Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2020-07-19T06:23:38Z

This issue has been automatically closed by a bot strictly because of inactivity. This does not mean that we think that this issue is not important! If you believe it has been closed hastily, add a comment to the issue and mention @kkm000, and I'll gladly reopen it.

danpovey reviewed Mar 23, 2017

View reviewed changes

freewym force-pushed the epsilon-backstitch branch 3 times, most recently from c85b29c to 36806ed Compare April 15, 2017 02:55

danpovey reviewed Apr 26, 2017

View reviewed changes

freewym added 17 commits May 3, 2017 20:02

adversarial training

d112fef

added NnetComputer object to adversarial step to accout for the code …

ccc6616

…refactoring

Travis re-check

c6246b2

fix

8fa3d7f

fix2

0563dcc

added natural gradient freezing code

a7de989

fix3

6e213a8

fix4

7d7c296

added option adversarial-training-interval; changes according to the …

8199405

…last review

change the term adversarial to backstitch

8053b86

epsilon backstitch (chain)

0f2f6c7

fix

8303d28

code re-orgnization

cb2d605

resets random generator's seed for RandomComponent on each minibatch …

e7acfa5

…to ensure consistency, e.g., dropout mask, in backstitch training

fix

8960bd4

discard epsilon-backstitch and go back to old backstitch

5b55ca6

moved srand seeding code out of TrainInternal(); documentation for Tr…

8fe51a5

…ainInternal()

freewym added 2 commits May 3, 2017 20:02

linearly increase the backstitch_training_scale in first 15 iterations

1c77ab7

added an example backstitch training script for tedlium chain model

73787c6

freewym force-pushed the epsilon-backstitch branch from cd59e5a to 73787c6 Compare May 4, 2017 00:02

freewym mentioned this pull request May 4, 2017

Backstitch kaldi 52 #1605

Merged

danpovey added the stopped development On backburner. Please comment if you think it should not be. label May 24, 2017

stale bot added the stale Stale bot on the loose label Jun 19, 2020

stale bot closed this Jul 19, 2020

kkm000 reopened this Jul 19, 2020

stale bot removed the stale Stale bot on the loose label Jul 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Epsilon Backstitch #1511

WIP: Epsilon Backstitch #1511

freewym commented Mar 22, 2017

danpovey Mar 22, 2017

danpovey Mar 22, 2017

danpovey Mar 22, 2017

danpovey Mar 23, 2017

danpovey Mar 23, 2017

freewym commented Apr 1, 2017

danpovey commented Apr 1, 2017 via email

freewym commented Apr 26, 2017

jtrmal commented Apr 26, 2017 via email

danpovey commented Apr 26, 2017 via email

freewym commented Apr 26, 2017

danpovey Apr 26, 2017

danpovey Apr 26, 2017

danpovey commented May 3, 2017

stale bot commented Jun 19, 2020

stale bot commented Jul 19, 2020

WIP: Epsilon Backstitch #1511

Are you sure you want to change the base?

WIP: Epsilon Backstitch #1511

Conversation

freewym commented Mar 22, 2017

danpovey Mar 22, 2017

Choose a reason for hiding this comment

danpovey Mar 22, 2017

Choose a reason for hiding this comment

danpovey Mar 22, 2017

Choose a reason for hiding this comment

danpovey Mar 23, 2017

Choose a reason for hiding this comment

danpovey Mar 23, 2017

Choose a reason for hiding this comment

freewym commented Apr 1, 2017

danpovey commented Apr 1, 2017 via email

freewym commented Apr 26, 2017

jtrmal commented Apr 26, 2017 via email

danpovey commented Apr 26, 2017 via email

freewym commented Apr 26, 2017

danpovey Apr 26, 2017

Choose a reason for hiding this comment

danpovey Apr 26, 2017

Choose a reason for hiding this comment

danpovey commented May 3, 2017

stale bot commented Jun 19, 2020

stale bot commented Jul 19, 2020