There are many open-source projects available to build an automatic speech recognition (asr) system. But with all of these libraries/systems, there comes a huge demand for in-depth knowledge to develop some asr on your own. Furthermore, there are hardware and data-set requirements which are not easy to fulfill.
Luckily Zamia-ASR is one of the projects which attempts to provide already usable speech-models and instructions as well as demo scripts to use them. As of today, they published language models for English and German language which work well in noisy conditions and on different microphone recordings.
Zamia is based on Kaldi, which is one of the most used solutions in the field of asr system development. This project makes use of the demonstration scripts and tries to abstract them as an easy-to-use REST-API.
Because of older python dependencies, inside the container Python with Version 2 is used. Please be aware of that since Python2 runs out of Long-Term-Support at the 1. January of 2020. This means no more updates in any kind.
Example Request: curl -X POST -F audio=@"<path/to/file>.wav" "http://<dockerhost>:5000/transcribe"
There are two docker images which build up the final API:
- zamia-asr-base: This image builds up on Debian 10 and installs all requirements for zamia-scripts.
- zamia-asr-server: This image uses the first one to run a flask-based Python HTTP-Server which in turn calls the demonstration scripts with specific commands.
docker build -t fabianbusch/zamia-asr-base:latest ./zamia-asr-base
docker build -t fabianbusch/zamia-asr-server:latest . && docker run -it -p 80:5000 fabianbusch/zamia-asr-server:latest