Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Epsilon Backstitch #1511

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Conversation

freewym
Copy link
Contributor

@freewym freewym commented Mar 22, 2017

For discussion. Modified on top of #1391

@@ -195,7 +201,10 @@ def train_new_models(dir, iter, srand, num_jobs,
fr_shft=frame_shift, l2=l2_regularize,
xent_reg=xent_regularize, leaky=leaky_hmm_coefficient,
parallel_train_opts=run_opts.parallel_train_opts,
momentum=momentum, max_param_change=max_param_change,
momentum=momentum,
backstitch_training_scale=backstitch_training_scale,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if it's feasible to just pass around a string 'backstitch_opts_str'. These functions have too many args.
[not urgent, you can do the experiments first; we may abandon the epsilon stuff if there turns out to be no advantage in it.]

that backstitch training is applied on.""")
self.parser.add_argument("--trainer.optimization.backstitch-training-epsilon",
type=float, dest='backstitch_training_epsilon',
default=0.0, help="""epsilon of backstitch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, we should probably make this default to something like 0.1 or 0.05 (and have it be the scale that determines whether or not backstitch is used at all). But we'll have to do some experiments first, to see whether the epsilon version is even better, and what the value should be.

@@ -68,23 +68,42 @@ void NnetChainTrainer::Train(const NnetChainExample &chain_eg) {
&request);
const NnetComputation *computation = compiler_.Compile(request);

NnetComputer computer(nnet_config.compute_config, *computation,
if (nnet_config.backstitch_training_scale > 0.0 &&
nnet_config.backstitch_training_epsilon > 0.0 && num_minibatches_processed_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to check that the epsilon is > 0.0 for this purpose. Let's just assert that it's always >0.0. (change the default to a nonzero value).

@@ -167,6 +188,14 @@ void NnetChainTrainer::UpdateParamsWithMaxChange() {
<< "UpdatableComponent; change this code.";
BaseFloat max_param_change_per_comp = uc->MaxChange();
KALDI_ASSERT(max_param_change_per_comp >= 0.0);
if (nnet_config.backstitch_training_scale > 0.0 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to do a much better job, via comments, of explaining what is going on here. Even I don't understand this. It's possible that the code structure here is the problem, and it will have to be changed so that it's easier to understand.

Here is my proposal.

We take UpdateParamsWithMaxChange out of the class (probably declare it in nnet-utils.h), remove the is_backstitch_step argument, take out the ScaleNnet() from it [that can be done separately by the
calling code], and give it this interface:

/**
       This function does the operation '*nnet += scale * delta_nnet', while respecting
       any max-parameter-change (max-param-change) specified in the updatable
       components, plus the global max-param-change specified as 'max_param_change'.

       With max-changes taken into account, the operation of this function is equivalent
       to the following, although it's done more efficiently:
        \code
             Nnet temp_nnet(delta_nnet);
             ScaleNnet(1.0 / max_change_scale, &temp_nnet);
             [ Scale down parameters for each component of temp_nnet as needed so
               their Euclidean norms do not exceed their per-component max-changes ]
             [ Scale down temp_nnet as needed so its Euclidean norm does not exceed
               the global max-change ]
             ScaleNnet(max_change_scale, &temp_nnet);  // undo the previous scaling.
             AddNnet(temp_nnet, scale, nnet);
        \endcode

       @param [in] delta_nnet  The copy of '*nnet' neural network that contains 
                                  the proposed change in parameters.  Normally this will previously
                                  have been set to: (delta_nnet = 
                                  parameter-derivative-on-current-minibatch * learning-rate per parameter),
                                  with any natural gradient applied as specified in the components; but
                                  this may be different if momentum or backstitch are used.
      @param [in] max_param_change  The global max-param-change specified on the 
                                  command line (e.g. 2.0), which specifies the largest change allowed
                                  to '*nnet' in Euclidean norm.  If <= 0, no global max-param-change will
                                  be enforced, but any max-change values specified in the components
                                  will still be enforced;  see UpdatableComponent::MaxChange(), and 
                                  search for 'max-change' in the configs or nnet3-info output).
      @param [in] max_change_scale  This value, which will normally be 1.0, is used to scale
                                  all per-component max-change values and the global 'max_param_change',
                                 before applying them (so we use 'max_param_change * uc->MaxChange()'
                                 as the per-component max-change, and 'max_param_change * max_change_scale'
                                 as the global max-change).
      @param [in] scale  This value, which will normally be 1.0, is a scaling factor used when adding 
                                 to 'nnet', applied after any max-changes.
       @param [in,out] nnet  The nnet which we add to.
*/
void UpdateNnetWithMaxChange(const Nnet &delta_nnet, 
                                              BaseFloat max_param_change,
                                              BaseFloat max_change_scale,
                                              BaseFloat scale, Nnet *nnet);

You may have to have the backstitch code in a separate code branch from the regular/momentum
training code, to keep the simple case simple.

nnet_);
scale_backstitch = (is_backstitch_step ? (-backstitch_ratio + 1 +
nnet_config.backstitch_training_epsilon) / backstitch_ratio :
nnet_config.momentum);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code wouldn't work with momentum (you'd probably need a separate copy of the nnet for that), so better to use 0.0 instead of nnet_config.momentum, and assert that momentum is zero. After restructuring there may be 2 paths for momentum vs backstitch, anyway.

I can't figure out whether this code is correct. My rule when I can't easily figure out correctness is to ask for the code to be restructured, because if present me can't figure it out, then future me definitely won't be able to, not to mention present or future anyone-else.

@freewym
Copy link
Contributor Author

freewym commented Apr 1, 2017

The reorganized code (alpha=alpha_old+alpha_old^2=0.24, epsilon=0.2) gave the same result as the old backstitch code (alpha_old=0.2).

@danpovey
Copy link
Contributor

danpovey commented Apr 1, 2017 via email

@freewym freewym force-pushed the epsilon-backstitch branch 3 times, most recently from c85b29c to 36806ed Compare April 15, 2017 02:55
@freewym
Copy link
Contributor Author

freewym commented Apr 26, 2017

@danpovey the latest backstitch code has been tested on the chain model.

@jtrmal
Copy link
Contributor

jtrmal commented Apr 26, 2017 via email

@danpovey
Copy link
Contributor

danpovey commented Apr 26, 2017 via email

@freewym
Copy link
Contributor Author

freewym commented Apr 26, 2017

Yes, I have done that in the commit "resets random generator's seed for RandomComponent..."

void NnetChainTrainer::TrainInternal(const NnetChainExample &eg,
const NnetComputation &computation,
bool is_backstitch_step) {
srand(srand_seed_ + num_minibatches_processed_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't do this if you're not doing backstitch, because it's not free of cost.
So only execute these two statements if you're doing backstitch; calling it outside of (and just before) TrainInternal will make more sense.

@@ -158,13 +169,12 @@ class NnetTrainer {

~NnetTrainer();
private:
void ProcessOutputs(const NnetExample &eg,
NnetComputer *computer);
void TrainInternal(const NnetExample &eg,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need documentation for this function.

@danpovey
Copy link
Contributor

danpovey commented May 3, 2017

I just noticed that this PR is to master, and it isn't even up to date with master. Can you please first try to merge with master, and then try to merge with the kaldi_52 branch? We'll be wanting to merge this with kaldi_52. You will likely have conflicts.
Obviously save this branch, as bad things can happen when merges fail.

@freewym freewym force-pushed the epsilon-backstitch branch from cd59e5a to 73787c6 Compare May 4, 2017 00:02
@freewym freewym mentioned this pull request May 4, 2017
@danpovey danpovey added the stopped development On backburner. Please comment if you think it should not be. label May 24, 2017
@stale
Copy link

stale bot commented Jun 19, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale bot on the loose label Jun 19, 2020
@stale
Copy link

stale bot commented Jul 19, 2020

This issue has been automatically closed by a bot strictly because of inactivity. This does not mean that we think that this issue is not important! If you believe it has been closed hastily, add a comment to the issue and mention @kkm000, and I'll gladly reopen it.

@stale stale bot closed this Jul 19, 2020
@kkm000 kkm000 reopened this Jul 19, 2020
@stale stale bot removed the stale Stale bot on the loose label Jul 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stopped development On backburner. Please comment if you think it should not be.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants