lintoai/linto-stt-kaldi
50K+
LinTO-STT-Kaldi is an API for Automatic Speech Recognition (ASR) based on models trained with Kaldi.
LinTO-STT-Kaldi can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector.
It can be used to do offline or real-time transcriptions.
To run the transcription models you'll need:
LinTO-STT-Kaldi accepts two kinds of models:
We provide home-cured models (v2) on dl.linto.ai. Or you can also use Vosk models available here.
The transcription service requires docker up and running.
The STT only entry point in task mode are tasks posted on a message broker. Supported message broker are RabbitMQ, Redis, Amazon SQS. On addition, as to prevent large audio from transiting through the message broker, STT-Worker use a shared storage folder (SHARED_FOLDER).
1- First step is to build or pull the image:
git clone https://github.com/linto-ai/linto-stt.git
cd linto-stt
docker build . -f kaldi/Dockerfile -t linto-stt-kaldi:latest
or
docker pull lintoai/linto-stt-kaldi
2- Download the models
Have the acoustic and language model ready at AM_PATH and LM_PATH if you are using LinTO models. If you are using a Vosk model, have it ready at MODEL.
3- Fill the .env file
An example of .env file is provided in kaldi/.envdefault.
PARAMETER | DESCRIPTION | EXEMPLE |
---|---|---|
SERVICE_MODE | STT serving mode see Serving mode | http|task|websocket |
MODEL_TYPE | Type of STT model used. | lin|vosk |
ENABLE_STREAMING | Using http serving mode, enable the /streaming websocket route | true|false |
SERVICE_NAME | Using the task mode, set the queue's name for task processing | my-stt |
SERVICE_BROKER | Using the task mode, URL of the message broker | redis://my-broker:6379 |
BROKER_PASS | Using the task mode, broker password | my-password |
STREAMING_PORT | Using the websocket mode, the listening port for ingoing WS connexions. | 80 |
CONCURRENCY | Maximum number of parallel requests | >1 |
STT can be used three ways:
Mode is specified using the .env value or environment variable SERVING_MODE
.
SERVICE_MODE=http
The HTTP serving mode deploys a HTTP server and a swagger-ui to allow transcription request on a dedicated route.
The SERVICE_MODE value in the .env should be set to http
.
docker run --rm \
-p HOST_SERVING_PORT:80 \
-v AM_PATH:/opt/AM \
-v LM_PATH:/opt/LM \
--env-file .env \
linto-stt-kaldi:latest
This will run a container providing an HTTP API binded on the host HOST_SERVING_PORT port.
Parameters:
Variables | Description | Example |
---|---|---|
HOST_SERVING_PORT | Host serving port | 80 |
AM_PATH | Path to the acoustic model on the host machine mounted to /opt/AM | /my/path/to/models/AM_fr-FR_v2.2.0 |
LM_PATH | Path to the language model on the host machine mounted to /opt/LM | /my/path/to/models/fr-FR_big-v2.2.0 |
MODEL_PATH | Path to the model (using MODEL_TYPE=vosk) mounted to /opt/model | /my/path/to/models/vosk-model |
The TASK serving mode connect a celery worker to a message broker.
The SERVICE_MODE value in the .env should be set to task
.
You need a message broker up and running at MY_SERVICE_BROKER.
docker run --rm \
-v AM_PATH:/opt/AM \
-v LM_PATH:/opt/LM \
-v SHARED_AUDIO_FOLDER:/opt/audio \
--env-file .env \
linto-stt-kaldi:latest
Parameters:
Variables | Description | Example |
---|---|---|
AM_PATH | Path to the acoustic model on the host machine mounted to /opt/AM | /my/path/to/models/AM_fr-FR_v2.2.0 |
LM_PATH | Path to the language model on the host machine mounted to /opt/LM | /my/path/to/models/fr-FR_big-v2.2.0 |
MODEL_PATH | Path to the model (using MODEL_TYPE=vosk) mounted to /opt/model | /my/path/to/models/vosk-model |
SHARED_AUDIO_FOLDER | Shared audio folder mounted to /opt/audio | /my/path/to/models/vosk-model |
Websocket server's mode deploy a streaming transcription service only.
The SERVICE_MODE value in the .env should be set to websocket
.
Usage is the same as the http streaming API
/healthcheck
Returns the state of the API
Method: GET
Returns "1" if healthcheck passes.
/transcribe
Transcription API
Return the transcripted text using "text/plain" or a json object when using "application/json" structure as followed:
{
"text" : "This is the transcription",
"words" : [
{"word":"This", "start": 0.123, "end": 0.453, "conf": 0.9},
...
]
"confidence-score": 0.879
}
/streaming
The /streaming route is accessible if the ENABLE_STREAMING environment variable is set to true.
The route accepts websocket connexions. Exchanges are structured as followed:
Connexion will be closed and the worker will be freed if no chunk are received for 10s.
/docs
The /docs route offers a OpenAPI/swagger interface.
STT-Worker accepts requests with the following arguments:
file_path: str, with_metadata: bool
Return format
On a successfull transcription the returned object is a json object structured as follow:
{
"text" : "this is the transcription as text",
"words": [
{
"word" : "this",
"start": 0.0,
"end": 0.124,
"conf": 1.0
},
...
],
"confidence-score": ""
}
You can test you http API using curl:
curl -X POST "http://YOUR_SERVICE:YOUR_PORT/transcribe" -H "accept: application/json" -H "Content-Type: multipart/form-data" -F "file=@YOUR_FILE;type=audio/x-wav"
This project is developped under the AGPLv3 License (see LICENSE).
docker pull lintoai/linto-stt-kaldi