Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online2 NNet3 TCP server program #2938

Merged
merged 36 commits into from
Mar 20, 2019
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
801ef4f
Adding code for the online2-net-nnet3-latgen-faster program
Dec 25, 2018
301ea0c
Added documentation for the online2-net-nnet3-latgen-faster program.
Dec 25, 2018
2137715
Added few more options.
Jan 20, 2019
9714b90
Merge branch 'master' into master
danijel3 Jan 20, 2019
0b2d79f
Renamed the binary to make it more like its intended purpose.
Jan 20, 2019
5023f1b
Merge branch 'master' of https://github.com/danijel3/kaldi
Jan 20, 2019
1dfefe8
Updated docs.
Jan 20, 2019
01d85ab
Changed from select to poll.
Jan 20, 2019
d1f3d7f
Merge branch 'master' into master
danijel3 Jan 24, 2019
89530fd
Fixed endpointing issues.
Jan 24, 2019
852f80c
Minor fixes.
Jan 24, 2019
64522a9
Another safety check.
Jan 26, 2019
8010b37
Merge branch 'master' into master
danijel3 Jan 26, 2019
e18beb4
[src] Add a frame offset setting to looped online decodable
dogancan Jan 28, 2019
4d7760a
Merge branch 'frame-offset' of https://github.com/dogancan/kaldi
danijel3 Jan 31, 2019
4e494b1
Made changes according to PR discussion. This particular commit doesn…
danijel3 Jan 31, 2019
7521b7d
Fix the range check for frame offset
dogancan Feb 1, 2019
18faa20
Merge branch 'frame-offset' of https://github.com/dogancan/kaldi
danijel3 Feb 1, 2019
248f1dd
Removed remainder processing.
danijel3 Feb 1, 2019
e948930
Removed spk adaptation options. Fixed bugs.
danijel3 Feb 3, 2019
1da690e
Merge branch 'master' into master
danijel3 Feb 3, 2019
e41a62e
Refactoring.
danijel3 Feb 3, 2019
9d0b3bd
Merge branch 'master' of https://github.com/danijel3/kaldi
danijel3 Feb 3, 2019
1c1ef40
Merge branch 'master' into master
danijel3 Feb 20, 2019
2085cd3
Typos and cosmetic changes.
Mar 17, 2019
5786ea6
Merge branch 'master' into master
danijel3 Mar 17, 2019
7bf02cb
Fixed the netcat command to include shutdown.
Mar 17, 2019
d4ee4c7
Merge branch 'master' of https://github.com/danijel3/kaldi
Mar 17, 2019
14572e9
Typos.
Mar 17, 2019
546cac5
Cosmetic changes.
Mar 18, 2019
522f9ae
Log fixes.
Mar 18, 2019
9ba2203
Merge branch 'master' into master
danijel3 Mar 18, 2019
fb92703
Bugfixes and cosmetic changes.
Mar 19, 2019
4d5927b
Merge branch 'master' of https://github.com/danijel3/kaldi
Mar 19, 2019
8c92170
More better bugfixes.
Mar 19, 2019
1b260b9
Merge branch 'master' into master
danijel3 Mar 19, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions src/doc/online_decoding.dox
Original file line number Diff line number Diff line change
Expand Up @@ -438,6 +438,70 @@ and downloadable models that can be used with online nnet3 decoding, please
see http://kaldi-asr.org/models.html (the first model there, the ASPIRE model,
includes instructions in a README file).

\subsection online_decoding_nnet3_net TCP server for nnet3 online decoding

The program to run the TCP sever is online2-net-nnet3-latgen-faster located in the
danijel3 marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we missed one :)

online2-net-nnet3-latgen-faster -> online2-tcp-nnet3-decode-faster

Also let's rename the subsection to online_decoding_nnet3_tcp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

~/src/online2bin folder. The usage is as follows:

\verbatim
online2-net-nnet3-latgen-faster <nnet3-in> <fst-in> <word-symbol-table> <listen-port>
danijel3 marked this conversation as resolved.
Show resolved Hide resolved
\endverbatim

For example:

\verbatim
online2-net-nnet3-latgen-faster model/final.mdl graph/HCLG.fst graph/words.txt 5050
danijel3 marked this conversation as resolved.
Show resolved Hide resolved
\endverbatim

The word symbol table is mandatory (unlike other nnet3 online decoding programs) because
the server outputs word strings. Endpoining is mandatory to make the operation of the
danijel3 marked this conversation as resolved.
Show resolved Hide resolved
program reasonable. Other, non-standard options include:
- samp-freq - sampling frequency of audio (usually 8000 for telephony and 16000 for other uses)
- chunk-length - length of signal being processed by decoder at each step
- output-freq - how often we check for changes in the decoding (ie. output refresh rate, default 1s)
danijel3 marked this conversation as resolved.
Show resolved Hide resolved
- num-threads-startup - number of threads used when initializing iVector extractor

The TCP protocol simply takes RAW signal on input (16-bit signed integer
encoding at chosen sampling frequency) and outputs simple text using the following
logic:
- each refresh period (output-freq argument) the current state of decoding is output
danijel3 marked this conversation as resolved.
Show resolved Hide resolved
- each line is terminated by '\r'
- once an utterance boundary is detected due to endpointing a '\n' char is output

Each output string (delimited by '\r') should be treated as uncertain and can change
entirely until the utterance delimiter ('\n') is sent. The delimiter chars are chosen
specifically in order to make the output look neat in the terminal. It is possible to
use it with other interfaces and a web demo (HTML/JS AudioAPI+WebSockets) exists.

To run the program from the terminal you can use one of the following commands. First,
make sure the server is running and accepting connections. Using the Aspire models, the
command should look like this:
\verbatim
online2-net-nnet3-latgen-faster --samp-freq=8000 --frames-per-chunk=20 --extra-left-context-initial=0
danijel3 marked this conversation as resolved.
Show resolved Hide resolved
--frame-subsampling-factor=3 --config=model/conf/online.conf --min-active=200 --max-active=7000
--beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 model/final.mdl graph/HCLG.fst graph/words.txt 5050
\endverbatim

To send a WAV file into the server, it first needs to be decoded into raw audio, then it can be
sent to the socket:
\verbatim
sox audio.wav -t raw -c 1 -b 16 -r 8k -e signed-integer - | nc localhost 5050
\endverbatim

It is possible to play audio (almost) simultaneously as decoding. It may require installing the
'pv' program (used to throttle the signal into Kaldi at the same speed as the playback):

\verbatim
sox audio.wav -t raw -c 1 -b 16 -r 8k -e signed-integer - | \
tee >(play -t raw -r 8k -e signed-integer -b 16 -c 1 -q -) | \
pv -L 16000 -q | nc localhost 5050
\endverbatim

Finally, it is possible to send audio from the microphone directly into the server:

\verbatim
rec -r 8k -e signed-integer -c 1 -b 16 -t raw -q - | nc localhost 5050
\endverbatim


*/
Expand Down
3 changes: 2 additions & 1 deletion src/online2bin/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ BINFILES = online2-wav-gmm-latgen-faster apply-cmvn-online \
online2-wav-nnet2-latgen-faster ivector-extract-online2 \
online2-wav-dump-features ivector-randomize \
online2-wav-nnet2-am-compute online2-wav-nnet2-latgen-threaded \
online2-wav-nnet3-latgen-faster online2-wav-nnet3-latgen-grammar
online2-wav-nnet3-latgen-faster online2-wav-nnet3-latgen-grammar \
online2-tcp-nnet3-decode-faster

OBJFILES =

Expand Down
Loading