Skip to content

Possible Improvements

Neumair Günther edited this page Apr 22, 2019 · 2 revisions

Multi-Stage Detection

Run a small model continuously. Keep a buffer of recent audio. When a possible Wake Word is detected run the buffer on a big model.

Add voice Activity detection

Combine VAD with keyword spotting to reduce false alarms caused by non-speech segments.

Command detection

Determine if an utterance is spoken as a command. Consider the two examples.

  • "You can say Marvin on to turn the light on"
  • "Marvin, turn the light on" Stress, intonation, and content define the second example as command and the first as common speech. The command detection should be able to differentiate between these cases.
Clone this wiki locally