Speech-To-Text (STT) Server for SEPIA Framework
7.5K
SEPIA Speech-To-Text (STT) Server is a WebSocket based, full-duplex Python server for real-time automatic speech recognition (ASR) supporting multiple open-source ASR engines. It can receive a stream of audio chunks via the secure WebSocket connection and return transcribed text almost immediately as partial and final results.
One goal of this project is to offer a standardized, secure, real-time interface for all the great open-source ASR tools out there. The server works on all major platforms including single-board devices like Raspberry Pi (4).
Currently the supported engines are Vosk and Coqui. Vosk comes together with small, but powerful ASR models for English and German and for Coqui there is a small English model (w/o scorer) included for experimentation. Official and custom ASR models for many languages can be added easily.
Language model adaptation tools are included as well, so you can start building custom domain models right away, using for example ZAMIA Speech (Kaldi ASR) models as starting point.
For more info visit: https://github.com/SEPIA-Framework/sepia-stt-server
NOTE: This is a complete rewrite (2021) of the original STT Server (2018). If you are using ZAMIA Speech custom Kaldi models built for the 2018 version you can easily convert them to new models. Please see: https://github.com/fquirin/kaldi-adapt-lm
Simply pull the latest image (or choose an older one form the archive):
docker pull sepia/stt-server:latest
Supported platforms:
Start the server:
sudo docker run --rm --name=sepia-stt -p 20741:20741 -it sepia/stt-server:latest
Visit the test page: http://localhost:20741
Content type
Image
Digest
Size
316.1 MB
Last updated
over 3 years ago
Requires Docker Desktop 4.37.1 or later.