GreenKey

Scribe Call Bot

Version 3.0 Documentation

A collection of microservices for handling voice streams and sending audio to Scribe's Transcription services.




Deploying the Scribe Call Bot

The Scribe Call Bot comprises three main services: Freeswitch, CallTranscriber, and WebsocketServer.

The Call Bot can be used with both real-time transcription (SQCServer) and delayed transcription (SVTServer) services. This documentation assumes you already have one of Scribe's transcription services already in-place.

All of Scribe's services require an authorized account. Contact us if you are interested in obtaining an account.




Step 1: Install Docker

Follow the instructions here to install Docker on your machine.




Step 2: Download the Docker Images

$ docker login -u [repository-user] -p [repository-password] docker.greenkeytech.com
Login Succeeded
$ docker pull docker.greenkeytech.com/freeswitch-scribe
$ docker pull docker.greenkeytech.com/calltranscriber
$ docker pull docker.greenkeytech.com/websocketserver

The credentials provided by GreenKey should include your repository-user and repository-password.




Step 3: Launch WebsocketServer

The Scribe WebsocketServer allows any application to relay messages along a websocket. In the context of the Call Bot, the WebsocketServer is used to relay transcription messages to any downstream listener.

You can launch the Websocket server as follows:

docker run \
  -d \
  --name websocketserver \
  -p 80:9000 \
  docker.greenkeytech.com/websocketserver

where:

-p 80:9000 binds the host port 80 to the container port 9000 that the server internally exposes.

Optional configuration parameters

Variable Default Description
TOKEN "XC7WFVC3UW3HPQRV5YUR" The authentication token that services will need to use to publish messages

Usage notes

The server will accept messages stringified JSON objects of the following form:

{
  'token':'XC7WFVC3UW3HPQRV5YUR',
  'endpoint':'/12345',
  'message':{'some_key':'some_value'}
}

All endpoints listening to ws://web.socket.address/12345 will receive the message. Messages can be sent to any websocket endpoint address with a valid token.

You can test the functionality using our example script here. Change wsUri to the address of your websocket server, then try sending the following message from the developer console:

doSend(
  {
    'token':'XC7WFVC3UW3HPQRV5YUR',
    'endpoint':'/12345','message':
    {'some_key':'some_value'}
  }
)

To enable SSL, check out the nginx proxy documentation below.




Step 4: Launch Freeswitch

The freeswitch container acts as a SIP endpoint and handles all SIP connections. The container requires open UDP ports for sending/receiving real-time media.

Launch it as follows:

docker run -d \
  --net host \
  --name freeswitch \
  -v $(pwd)/[tmp-recordings]:/scribe/recordings \
  -e DOMAIN="[freeswitch-host]" \
  -e EXTERNAL_RTP_IP="[freeswitch-ip]" \
  -e EXTERNAL_SIP_IP="[freeswitch-ip]" \
  -p 5060:5060/udp \
  -p 16384-16584:16384-16584/udp \
  -p 8021:8021 \
  -p 2222:2222 \
  docker.greenkeytech.com/freeswitch-scribe

where:

[tmp-recordings] is a storage directory where audio will be temporarily written while processing.

[freeswitch-host] is the domain or IP address to serve the SIP endpoint on. Do not use localhost here; instead, use the IP address of the server.

[freeswitch-ip] is the IP address of the endpoint.

Note that the ports below are listed for convenience in the command above, as --net host does not require port binding.

16384-16584:16384-16584/udp is the port range for RTP media. Two ports are necessary for every concurrent connection. For example, a port range of 16384-16394 would open 10 ports supporting 5 concurrent connections. The default configuration supports 100 concurrent connections.

-p 5060:5060/udp is the SIP signalling port.

-p 8021:8021 is the freeswitch event socket port (used later by the CallTranscriber)

Optional configuration parameters

Variable Default Description
ESL_PASSWORD "greenkey" Authentication string for event socket connections
CODEC_PREFS "G722" Global codec preference order for media, comma separated (e.g. "OPUS,G722,PCMU,PCMA")
RTP_START_PORT "16384" Begin bound for UDP port range purposed for RTP traffic
RTP_END_PORT "16584" End bound for UDP port range purposed for RTP traffic
MAX_SESSIONS "100" Maximum total ongoing connections (1/2 of number of RTP ports)
CONFERENCE_SAMPLE_RATE "44100" Live sample rate for all created conferences
RECORD_SAMPLE_RATE "44100" Static sample rate for all audio recorded to files
CONFERENCE_ENERGY_LEVEL "300" Energy level required for audio to be sent to other conferees (noise floor)
SIP_WS_PORT "2222" Port to which the userAgent Server should bind for insecure websocket connections
SIP_WSS_PORT "3333" Port to which the userAgent Server should bind for secure websocket connections

To enable TLS:

docker run -d \
  ...
  -v $(pwd)/freeswitch-tls:/fs/tls \
  -e SIP_TLS_ENABLE="true" \
  -p 5061:5061/tcp \
  -p 3333:3333 \
  ...

Adding a SIP Trunk

If you want to use a telephone number to call your deployment, you will need to use a SIP Trunking service and point it at [fs-domain]:5060 or [fs-domain]:5061 for TLS. The CallTranscriber below will handle monitoring of the freeswitch service such proper authentication (via inbound number, endpoint number, or pin) is required to successfully initiate a call.




Step 5: Launch CallTranscriber

The CallTranscriber service manages incoming SIP connections via the freeswitch event socket and begins real-time or delayed transcription.

Launch the container as follows:

Real-Time Usage

docker run \
  -d \
  --name calltranscriber \
  -v $(pwd)/[tmp-recordings]:/scribe/recordings \
  -v $(pwd)/[call-config]:/scribe/vmdata \
  -e ID_MODE="PIN" \
  -e TRANSCRIPTION_MODE="REALTIME" \
  -e SQC_SERVER="ws://[sqcserver-host]:8888" \
  -e ESL_STRING="[freeswitch-host]:8021" \
  -e WS_SERVER="ws://[websocketserver-host]" \
  -e WS_SERVER_TOKEN="[websocketserver-token]" \
  docker.greenkeytech.com/calltranscriber

Delayed Usage

docker run \
  -d \
  --name calltranscriber \
  -v $(pwd)/[tmp-recordings]:/scribe/recordings \
  -v $(pwd)/[call-config]:/scribe/vmdata \
  -e ID_MODE="PIN" \
  -e TRANSCRIPTION_MODE="DELAYED" \
  -e SVT_SERVER="http://[svtserver-host]:5000" \
  -e MAX_SEG_LEN="10000" \
  -e ESL_STRING="[freeswitch-host]:8021" \
  -e WS_SERVER="ws://[websocketserver-host]" \
  docker.greenkeytech.com/calltranscriber

where:

[tmp-recordings] is the same directory you set above for the freeswitch instance.

[call-config] is a directory with optional text files that you can use to specify restrictions on who can use the call bot. The directory can contain up to three files as specified below. If a file is not specified, no restrictions are imposed on that aspect.

ID_MODE is the identification mode for posting transcription to the websocket server. Since the websocket server will relay messages to a route defined by a numeric endpoint, the ID_MODE can be set to either an entered pin number (PIN), the inbound call number (CALLER) or the number dialed to the Call Bot endpoint (ENDPOINT). For example, if ID_MODE="CALLER" with the callers.txt file above, transcriptions for the number 847-232-5000 would be relayed to ws://[websocketserver-host]/8472325000.

TRANSCRIPTION_MODE is either REALTIME with SQC_SERVER set or DELAYED with SVT_SERVER set.

ESL_STRING is the path to the freeswitch instance event socket. Here, [freeswitch-host]:8021 is the same host you set earlier for the freeswitch container. If you are running the two containers on the same host, specify the host IP, not localhost.

[sqcserver-host] is the path to your SQCServer instance, if REALTIME is the transcription mode. Replace ws with wss for secure websockets, not localhost.

[svtserver-host] is the path to your SVTServer instance, if DELAYED is the transcription mode. Change the port number here if your service is on a non-default port. Replace http with https for secure websockets, not localhost.

[websocketserver-host] is the path to the websocket server setup above. Replace ws with wss for secure websockets, not localhost.

MAX_SEG_LEN is the maximum segment length (in milliseconds) for delayed transcription. The default value is 30000 milliseconds. Reduce this for more segmentation in the transcript.

Optional configuration parameters

Variable Default Description
WS_SERVER_TOKEN "XC7WFVC3UW3HPQRV5YUR" The authentication token specified for the websocket server
ESL_PASSWORD "greenkey" The event socket password specified on the freeswitch container
SQC_SAMPLE_RATE "44100" The sample rate of the freeswitch recording to be passed to SQCServer for real-time transcription
LOG_LEVEL 0 Level of logging, with 2 being most verbose
RECORDINGS_DIR "/scribe/recordings" Mounted directory for shared temporary recording storage between freeswitch and calltranscriber




Step 6: Add an nginx SSL proxy

You can add SSL to the websocket server via an nginx proxy. Launch the proxy as follows:

docker run -d \
  --name nginx \
  --net host \
  -v $(pwd)/[nginx-conf]:/etc/nginx/nginx.conf \
  -v $(pwd)/[certificates]:/ssl \
  nginx:alpine

where:

[certificates] contains your SSL certificate and key.

[nginx-conf] contains a valid nginx configuration file. An example one is presented below. Set the [server-name], [certificate-file] and [certificate-key] as needed. The default location / assumes the websocket server above has been launched on the default port 9000.

events {
  worker_connections  1024;
}
http {
  server {
    listen 80;
    return 301 https://$host$request_uri;
  }
  server {
    listen 443;

    server_name [server-name];
    ssl_certificate /ssl/[certificate-file];
    ssl_certificate_key /ssl/[certificate-key];

    client_max_body_size 0;

    ssl on;
    ssl_session_cache builtin:1000 shared:SSL:10m;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers HIGH:!aNULL:!eNULL:!EXPORT:!CAMELLIA:!DES:!MD5:!PSK:!RC4;
    ssl_prefer_server_ciphers on;

    location / {
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "upgrade";
      proxy_http_version 1.1;
      proxy_pass http://localhost:9000;
      proxy_connect_timeout 86400;
      proxy_send_timeout 86400;
      proxy_read_timeout 86400;
    }
  }
}




Step 7: Test your Deployment

After setting up your Call Bot, you can use our websocket test script to connect to it. Dial your SIP endpoint, enter the PIN number (if necessary), then watch for results.