Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backstitch kaldi 52 #1605

Merged
merged 5 commits into from
Jun 14, 2017
Merged

Backstitch kaldi 52 #1605

merged 5 commits into from
Jun 14, 2017

Conversation

freewym
Copy link
Contributor

@freewym freewym commented May 4, 2017

Continued work from #1511

@freewym freewym force-pushed the backstitch-kaldi_52 branch 4 times, most recently from 7672f53 to d1d6b52 Compare May 8, 2017 05:53
@freewym freewym force-pushed the backstitch-kaldi_52 branch from d1d6b52 to 4ecdf04 Compare May 13, 2017 04:53
@freewym freewym force-pushed the backstitch-kaldi_52 branch 2 times, most recently from a9565a8 to 7a20331 Compare May 23, 2017 18:08
@danpovey danpovey changed the base branch from kaldi_52 to master May 30, 2017 03:40
@danpovey
Copy link
Contributor

@freewym, can you please fix conflicts?
I'm considering merging this version soon, since it looks like doing it at the egs level would be more complicated than I thought, due to the interaction with natural gradient.

@freewym
Copy link
Contributor Author

freewym commented May 30, 2017

OK, will do it tomorrow.

@freewym freewym force-pushed the backstitch-kaldi_52 branch 2 times, most recently from c231970 to 7a20331 Compare May 31, 2017 06:13
@freewym freewym force-pushed the backstitch-kaldi_52 branch from 7a20331 to 546ec30 Compare May 31, 2017 07:02
@freewym
Copy link
Contributor Author

freewym commented May 31, 2017

@danpovey Do you prefer to add all the backstitch experiments scripts to this PR, or create a separate PR after merging this one?

@danpovey
Copy link
Contributor

danpovey commented May 31, 2017 via email

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various minor cosmetic and code-refactoring requests.


# ./local/chain/compare_wer_general.sh --looped exp/chain_cleaned/tdnn_lstm1e_sp_bi exp/chain_cleaned/tdnn_lstm1t_sp_bi
# System tdnn_lstm1v_sp_bi tdnn_lstm1t_sp_bi
# WER on dev(orig) 8.6 9.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We normally have the current experiment last-- this might confuse users a little.

@@ -307,6 +316,11 @@ def train_one_iteration(dir, iter, srand, egs_dir,
xent_regularize=xent_regularize,
leaky_hmm_coefficient=leaky_hmm_coefficient,
momentum=momentum,
# linearly increase backstitch_training_scale during the
# first few iterations(hard-coded as 15 for now)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "for now" as we don't plan to change it. and space before (.

@@ -25,7 +25,9 @@
def train_new_models(dir, iter, srand, num_jobs,
num_archives_processed, num_archives,
raw_model_string, egs_dir,
momentum, max_param_change,
momentum,
backstitch_training_scale, backstitch_training_interval,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make other parts of the code more robust to the change, if you were to put the new parameters at the end and give them default values, e.g. 0.0 and 1. No need to put them in the exact same order when you call this function, though.


void NnetChainTrainer::ProcessOutputs(const NnetChainExample &eg,
void NnetChainTrainer::ProcessOutputs(bool is_backstitch_step,
const NnetChainExample &eg,
NnetComputer *computer) {
// normally the eg will have just one output named 'output', but
// we don't assume this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably clarify in this comment that in the diagnostics,
the output-name with the "_backstitch" suffix is the one
computed after the first, backward step of backstitch.

// backstitch training is incompatible with momentum > 0
KALDI_ASSERT(nnet_config.momentum == 0.0);
FreezeNaturalGradient(true, delta_nnet_);
bool is_backstitch_step = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name 'is_backstitch_step' is not very clear as backstitch has two steps.
You should probably say 'is_backstitch_step1'.

@@ -70,6 +74,13 @@ struct NnetTrainerOptions {
"so that the 'effective' learning rate is the same as "
"before (because momentum would normally increase the "
"effective learning rate by 1/(1-momentum))");
opts->Register("backstitch-training-scale", &backstitch_training_scale,
"backstitch traning factor. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo traning; and mention that is alpha in our publications.

opts->Register("backstitch-training-interval",
&backstitch_training_interval,
"do backstitch training with the specified interval of "
"minibatches.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention that this is the interval n in publications.

/**
This function does the operation '*nnet += scale * delta_nnet', while
respecting any max-parameter-change (max-param-change) specified in the
updatable components, plus the global max-param-change specified as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plus -> and also

@param [in] scale This value, which will normally be 1.0, is a scaling
factor used when adding to 'nnet', applied after any max-changes.
@param [in,out] nnet The nnet which we add to.
@param [in,out] num_max_change_per_component_applied Stats for per-component
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify that we add to the elements of this (if that is what this function does).
Also I would write this as just [out].

@param [in,out] nnet The nnet which we add to.
@param [in,out] num_max_change_per_component_applied Stats for per-component
max-change.
@param [in,out] num_max_change_global_applied Stats for global max-change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify that we add to this. And I'd write this as just [out]. I might be wrong in doing that, but other documentation uses this convention.

@danpovey
Copy link
Contributor

@freewym, there is something else, and @GaofengCheng, pay attention to this, as it may affect your experiments with dropout and backstitch.
I am concerned that it may be insufficient to rely on the egs-shuffling to ensure that on different epochs, different data gets backstitch applied to it. The issue is that if there are 'rare-sized' egs, they won't get shuffled enough-- we use a buffer size of 2000, but rare sizes of egs may get spit out in almost the same order as they came in, if we consider egs of only one type.
So instead of a condition like
num_minibatches_processed_ % interval == 0
you should use a condition like
num_minibatches_processed_ % interval == srand_seed_ % interval.
But that means that you will have to start setting the srand-seed via the --srand option of nnet3-train and nnet3-chain-train (add the option to the scripts if needed; set it to the same value as is provided to nnet3-shuffle-egs).
This may have a small impact on experiments with dropout (I doubt it, but it could happen).

@freewym
Copy link
Contributor Author

freewym commented Jun 1, 2017

srand_seed_ is initialized with a random integer in the constructor. which are almost different across all calling of the training binary. So it might be suffice to solve the issue for 'rare-size' egs by changing the condition to what you proposed?

@danpovey
Copy link
Contributor

danpovey commented Jun 1, 2017 via email

@freewym
Copy link
Contributor Author

freewym commented Jun 1, 2017

Oh yes.

@GaofengCheng
Copy link
Contributor

@danpovey I can try backstitch+dropout after Yiming fixing as you recommended and see

@jtrmal
Copy link
Contributor

jtrmal commented Jun 1, 2017 via email

@danpovey
Copy link
Contributor

danpovey commented Jun 1, 2017 via email

@freewym freewym force-pushed the backstitch-kaldi_52 branch from d3bbda2 to 320bfbb Compare June 1, 2017 21:11
NnetComputer computer(nnet_config.compute_config, *computation,
if (nnet_config.backstitch_training_scale > 0.0 && num_minibatches_processed_
% nnet_config.backstitch_training_interval ==
srand_seed_ % nnet_config.backstitch_training_interval) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the new refactored code usng chain model (tdnn-lstm system on AMI, with backstitch enabled or disabled respectively, if enabled, test interval=1 or 5 respectively), and xent models (using the CIFAR recipe, with backstitch enabled or disabled, if enabled, only test interval=1), and compared their results with my old runs. Among those results, the only one that cannot match the old result is the chain+tdnn-lstm AMI system with backstitch interval=5 setting, where WER is ~0.8 worse than the old run, while the valid log-prob is also a bit lower. So now I am changing this expression to
nnet_config.backstitch_training_interval == 0
to see if I can reproduce the old result.

@danpovey
Copy link
Contributor

danpovey commented Jun 6, 2017 via email

@danpovey danpovey merged commit ecc6a78 into kaldi-asr:master Jun 14, 2017
kronos-cm added a commit to kronos-cm/kaldi that referenced this pull request Jun 15, 2017
* 'master' of https://github.com/kaldi-asr/kaldi: (140 commits)
  [egs] Fix failure in multilingual BABEL recipe (regenerate cmvn.scp) (kaldi-asr#1686)
  [src,scripts,egs] Backstitch code+scripts, and one experiment, will add more later. (kaldi-asr#1605)
  [egs] CNN+TDNN+LSTM experiments on AMI (kaldi-asr#1685)
  [egs,scripts,src] Tune image recognition examples; minor small changes. (kaldi-asr#1682)
  [src] Fix bug in looped computation (kaldi-asr#1673)
  [build] when installing sequitur and mmseg, look for lib64 as well (thanks: @akshayc11) (kaldi-asr#1677)
  [src] fix to gst-plugin/Makefile (remove -lkaldi-thread) (kaldi-asr#1680)
  [src] Cosmetic fixes to usage messages
  [egs]  Fix to some --proportional-shrink related example scripts (kaldi-asr#1674)
  [build] Fix small bug in configure
  [scripts] Fix small bug in utils/gen_topo.pl.
  [scripts] Add python script to convert nnet2 to nnet3 models (kaldi-asr#1611)
  [doc] Fix typo  (kaldi-asr#1669)
  [src] nnet3: fix small bug in checking code.  Thanks: @Maddin2000.
  [src] Add #include missing from previous commit
  [src] Fix bug in online2-nnet3 decoding RE dropout+batch-norm (thanks: Wonkyum Lee)
  [scripts] make errors getting report non-fatal (thx: Miguel Jette); add comment RE dropout proportion
  [src,scripts] Use ConstFst or decoding (half the memory; slightly faster).  (kaldi-asr#1661)
  [src] keyword search tools: fix Minimize() call, necessary due to OpenFst upgrade (kaldi-asr#1663)
  [scripts] do not fail if the ivector extractor belongs to different user (kaldi-asr#1662)
  ...
@freewym freewym deleted the backstitch-kaldi_52 branch October 11, 2017 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants