GreenKey

Scribe Voice Transcription

Version 3.5 Documentation

An overview and best practices for using GreenKey Scribe to transcribe audio files.




Getting Started

Scribe's Voice Transcription Server (SVTServer) can be launched on any Docker-compatible machine with at least 2 CPUs and 6 GB RAM with valid credentials. Contact us if you are interested in obtaining an account.


(1) Install Docker

Follow the instructions here to install Docker on your machine.


(2) Download the SVTServer Docker image

$ docker login -u [repository-user] -p [repository-password] docker.greenkeytech.com
Login Succeeded
$ docker pull docker.greenkeytech.com/svtserver

The credentials provided by GreenKey should include your repository-user and repository-password.


Transferring the image to a machine without internet access

If you would like to install the image on a machine without internet later, you can compress the image.

$ docker save -o greenkeyscribe.tar docker.greenkeytech.com/svtserver:latest

Then, on the machine without internet, load the compressed image (no repository login required).

$ docker load -i greenkeyscribe.tar


(3) Launch a Scribe Transcription Service

Scribe can be launched with or without internet access. GreenKey will provide you with either a scribe-username and scribe-secretkey or a scribe-licensekey.

A scribe-licensekey can be used for deployments with or without internet access. A scribe-username and scribe-secretkey will require internet access to be used.

With a username and secret key (internet required):

$ docker run --rm -d \
  -e GKT_API="https://scribeapi.greenkeytech.com/" \
  -e GKT_USERNAME="[scribe-username]" \
  -e GKT_SECRETKEY="[scribe-secretkey]" \
  -p [target-port]:5000 \
  -v $(pwd)/[upload-directory]:/uploads \
  -v $(pwd)/[storage-directory]:/files \
  -e PROCS=2 \
  -e DIARIZE="True" \
  docker.greenkeytech.com/svtserver

With a license key (no internet required):

$ docker run --rm -d \
  -e LICENSE_KEY="[scribe-licensekey]" \
  -p [target-port]:5000 \
  -v $(pwd)/[upload-directory]:/uploads \
  -v $(pwd)/[storage-directory]:/files \
  -v $(pwd)/[license-directory]:/scribe/gktlicense \
  -e PROCS=2 \
  -e DIARIZE="True" \
  docker.greenkeytech.com/svtserver

target-port is an open port on the machine that you will use to access the service.

upload-directory is a directory where you want final transcripts from uploaded files to appear.

storage-directory is a directory where you can place large audio files (greater than 20 MB in size) for transcription and also where their final transcripts will appear.

license-directory is a directory where your license files and the usagelog file is stored. Usage is automatically posted to our servers if your container is launched without a license key.

PROCS is the number of CPU cores on your machine to use. Specify more cores for faster transcription.


NOTE some versions of Docker require the following changes to the launch commands above:

1) Remove the --rm flag. This will result in containers remaining on disk after they stop. You can manually remove them by running docker ps -a to get the container ID numbers and docker rm [container-ID] to remove the containers.

2) Add the flag --net host to allow access to the service port.


Renewing License Keys

Your scribe-licensekey will be provided along with its expiration date. To receive a new key, please report your usage as described below.

If you can temporarily provide internet access to your container, you can manually trigger usage reporting by sending the following request to your container:

$ curl -X POST localhost:`target-port`/report \
  -H "Content-type: multipart/form-data" \
  -F 'data={"GKT_API":"https://scribeapi.greenkeytech.com/","GKT_USERNAME":"[scribe-username]","GKT_SECRETKEY":"[scribe-secretkey]"};type=application/json'

If your container cannot access the internet, please copy the license-directory to an internet-enabled machine and report your usage as follows:

$ docker run --rm -d \
  -e GKT_API="https://scribeapi.greenkeytech.com/" \
  -e GKT_USERNAME="[scribe-username]" \
  -e GKT_SECRETKEY="[scribe-secretkey]" \
  -v $(pwd)/[license-directory]:/scribe/gktlicense \
  -e REPORT="True" \
  docker.greenkeytech.com/svtserver


(4) Confirm the container is running

$ curl -X GET localhost:5000/status
{
  "message": "No transcription jobs currently in progress.",
  "status": 0
}

Replace 5000 with whatever target-port you previously specified.


(5) Transcribe a File

Check if the container is working by transcribing a test file.

$ curl -X POST localhost:5000/sync/upload \
  -H "Content-type: multipart/form-data" \
  -F 'data={};type=application/json' \
  -F "file=@test.ogg"
{
  "duration": 3.2,
  "progress": 100.0,
  "segments": [
    {
      "boundary": "phrase",
      "endTimeSec": 1.21,
      "startTimeSec": 0.0,
      "transcript": "testing",
      "words": [
        {
          "confidence": 1.0,
          "length": 0.51,
          "start": 0.63,
          "word": "testing"
        }
      ]
    },
    {
      "boundary": "phrase",
      "endTimeSec": 1.77,
      "startTimeSec": 1.2,
      "transcript": "testing",
      "words": [
        {
          "confidence": 1.0,
          "length": 0.51,
          "start": 0.0,
          "word": "testing"
        }
      ]
    },
    {
      "boundary": "phrase",
      "endTimeSec": 3.2,
      "startTimeSec": 1.76,
      "transcript": "one two three",
      "words": [
        {
          "confidence": 1.0,
          "length": 0.24,
          "start": 0.0,
          "word": "one"
        },
        {
          "confidence": 1.0,
          "length": 0.21,
          "start": 0.24,
          "word": "two"
        },
        {
          "confidence": 1.0,
          "length": 0.33,
          "start": 0.45,
          "word": "three"
        }
      ]
    }
  ],
  "transcription_time": 22.6
}

The final transcript will also be available in the result-directory you specified on container launch as a file named test.json.


(6) Shutdown the Service

curl -X GET localhost:5000/shutdown




Supported File Types

Scribe supports a wide array of compressed file formats. Any format supported natively by ffmpeg is also supported by Scribe.

For .WAV files, certain restrictions apply. Files ending in .WAV must be PCM 16-bit encoding. If the file has a different compression or encoding and still has the .WAV extension (such as mp3, g722, or ulaw), then the extension of the file should be replaced with the actual encoding format (i.e. filename.wav -> filename.g722).

Sometimes, Scribe may report a file encoding error if the file cannot be found or read. Please check that the location of the file is in a correctly mounted directory or is uploaded from local storage.




Accuracy

Scribe delivers industry-leading transcription for financial conversations. To see how we rate the accuracy of our engine against others, check out our latest benchmarks

Obtaining the highest accuracy transcriptions requires high audio quality and a knowledgable model. Learn more about audio quality and model customization as ways to improve accuracy.




Configuring Scribe

Scribe can be configured at the service and job level. A Scribe Transcription service can be configured using parameters specified as environmental variables during the run command:

$ docker run --rm -d \
  ...
  -e PARAM-NAME="PARAM-VALUE" \
  -e PARAM-NAME="PARAM-VALUE" \
  ...
  docker.greenkeytech.com/svtserver

These configuration parameters apply globally to a container, meaning that they will affect every job submitted to that service.


A single transcription job submitted to a service can also be configured via a data object.

For jobs where files are uploaded, the parameters are specified as such:

$ curl -X POST localhost:5000/upload \
  -H "Content-type: multipart/form-data" \
  -F 'data={"PARAM-NAME":"PARAM-VALUE","PARAM-NAME":"PARAM-VALUE"};type=application/json' \
  -F "file=@path/to/test.ogg"

Note that the file path above path/to/test.ogg is a path relative to the local host storage system.

For jobs where files are read from mounted storage, the parameters are specified as such:

$ curl -X POST localhost:5000/ \
  -H "Content-type: application/json" \
  -d '{"file":"/files/test.ogg","PARAM-NAME":"PARAM-VALUE","PARAM-NAME":"PARAM-VALUE"}'

Note that the file path above /files/test.ogg is a path relative to the container file storage system. If the container was started with the mounted path -v /home/user/files:/files, then a transcription job on /home/user/files/my-file.ogg would be sumbitted with the container file path /files/my-file.ogg.

Note the style of quotation marks in each case.


Testing all parameters

Check out our ASR Toolkit for an example of how to expose all metadata and available features for Scribe.


Available Configuration Options

An overview of configuration options is shown below. Check out the other sections of the documentation to learn about the various ways to configure Scribe.

In many of the examples, [bracketed-terms] specifying configuration options will be present. When replacing these terms, you should not include the brackets in the final configurations.

Feature Global Configuration Job Configuration Default Value
# Processors PROCS=2 N/A 2
Sync Timeout SYNC_TIMEOUT="7200" N/A 3600
Target File TARGET_FILE="/files/myfile.wav" {"file":"/files/myfile.wav"} None
Output Type TYPE="transcript" (see Output Types for details) json
Speaker Diarization MULTISPEAKER="True" {"multiSpeaker":"True"} False
Max Number of Clusters for Diarization MAX_NUM_CLUST="5" {"maxNumClust": "5"} "5"
Speaker Diarization DIARIZE="True" {"diarize": "True"} False
Min Number of Clusters for Diarization MIN_NUM_CLUST="2" {"minNumClust": "2"} "2"
Speaker ID SPKR_SAMPLE="/files/sample.wav" {"speakerSample":"/files/sample.wav"} None
Single Speaker Segmentation SINGLE_SPKR_SEGMENT="True" {"singleSpkrSegment": "True"} False
Gender Classification GENDER_CLASSIFY="True" {"genderClassify": "True"} False
Spoken Language Identification LANG_ID="True" {"langId": "False"} False
Default Language for Transcription during Language Identification DEFAULT_LANG="English" N/A English
Decode Mode (slow / test / latest / fast) DECODE_MODE="slow" N/A latest
Force Single Model FORCE_MODEL="svt-lstm" {"forceModel":"greenkey_svt_lstm"} N/A
N-Best Hypotheses N-BEST="2" {"n-best":"2"} 1
Word Confusion Matrix WORD_CONFUSIONS="True" {"wordConfusions":"True"} False
Cloud Mode ENABLE_CLOUD="True" N/A True
Custom Models CUSTOM_MODEL="True" N/A False
Audio Quality AUDIO_QUALITY="True" {"audioQuality":"True"} True
Transcript Formatting TRANSCRIPT_FORMATTER="default" N/A None
Enable Insights FIND_INSIGHTS="True" {"findInsights":"True"} False
# Key Terms NUM_KEYTERMS="10" N/A 5
# Key Phrases NUM_KEYPHRASES="15" N/A proportional to audio file duration
Identify Quotes IDENTIFY_QUOTES="True" {"identifyQuotes":"True"} False
Replace Quotes REPLACE_QUOTES="True" {"replaceQuotes":"True"} False
Strict Quotes STRICT_QUOTES="True" {"strictQuotes":"True"} True
Verbose Logging VERBOSE_LOGGING="True" N/A False
Noise Filtering NOISE_FILTERING="True" N/A False
Word Filter WORD_FILTER="all" {"wordFilter":"all"} False
Max Segment Length MAX_SEG_LEN="15" N/A 15
Max Upload Size MAX_UPLOAD_SIZE="100" N/A 20
Email address EMAIL_ADDRESS="user@example.com" {"emailAddress":"user@example.com"} None
SendGrid key SENDGRID_KEY="[hash]" {"sendgridKey":"[hash]"} None
Callback URL CALLBACK_URL="http://callback.url/route" N/A None
Callback Headers CALLBACK_HEADERS="Content-type: application/json" N/A None
Discovery Engine DISCOVERY="True" {"discovery":"True"} False
Discovery Domains DISCOVERY_DOMAINS="safety" {"discoveryDomains":["safety"]} None
Discovery Intents DISCOVERY_INTENTS="ten_code" {"discoveryIntents":["ten_code"]} None
Normalization Mode NORMALIZATION="none" N/A "low"
Noise Threshold NOISE_THRESHOLD="550" N/A "550"
Amplitude Cutoff AMPLITUDE_CUTOFF="1.55" N/A "1.55"




Scaling Scribe

Deploying a Scaled Transcription Service

Scribe Historical Transcription containers can be deployed in parallel as a service using a container orchestration engine, such as Kubernetes. An example stateless Kubernetes service with persistent storage can be found here. The remainder of this documentation section assumes you are using the example template.


(1) Setup up a namespace and registry secret

You will need to setup a registry secret for your cluster to access our private docker registry. An example is provided below, where [bracketed terms] should be replaced with your provided credentials:

$ kubectl create namespace scribe-cloud
namespace "scribe-cloud" created
$ kubectl create secret docker-registry gktregsecret \
  --namespace=scribe-cloud \
  --docker-server=docker.greenkeytech.com \
  --docker-username=[repository-username] \
  --docker-password=[repository-password] \
  --docker-email=[administrative-email-address]


(2) Configure container credentials

At minimum, you must configure the following parameters in the example yaml file provided:

...
- name: GKT_USERNAME
  value: "[scribe-username]"
- name: GKT_SECRETKEY
  value: "[scribe-secretkey]"
...


(3) Configure resource usage

The example configuration uses 3 processors and 8GB of RAM allocated:

...
env:
  - name: PROCS
    value: "3"
...
resources:
  limits:
    memory: "8000Mi"
  requests:
    memory: "8000Mi"
...

You may want to change these options depending on factors such as the size of your cluster nodes and your typical audio file length. While adding processors to each replica will reduce overall transcription time, the time savings are greater for longer files. The graph below shows how transcription time scales with additional processors for various file lengths:



(4) Configure persistent storage

This is the location where audio files and transcripts will be stored. Both /files and /uploads will contain final transcripts once generated.

If you have audio files greater than 20 MB in size, you should place them in /files and use the appropriate route below for transcription.

If you have files less than 20 MB in size, they will be uploaded to /uploads and deleted once transcription finishes.

...
volumes:
  - name: uploads
    nfs:
      path: /data/scribe-cloud/uploads
      server: 10.240.0.7
  - name: files
    nfs:
      path: /data/scribe-cloud/files
      server: 10.240.0.7


(5) Start your service

$ kubectl apply -f kubernetes_example.yaml

Your kubernetes transcription service will now launch and be internally exposed. You can use the following command to find the internal IP of your service:

$ kubectl --namespace scribe-cloud describe service scribe-cloud

Your service will not be exposed to the outside world without specifying the service as a Load Balancer or an additional ingress controller. Need help on setting these up? Contact us.


Using a Deployed Transcription Service

Batch transcription

Check out our ASR Toolkit for examples of how to submit batch jobs to your newly deployed transcription service.


Obtaining asynchronous transcripts (requires persistent shared storage)

Transfer your files one of two ways:
  1. Manually transfer to the location of the /files persistent storage directory.
  2. Use the /upload route to upload your file for transcription.
Obtain transcripts in one of two ways:
  1. Wait for the corresponding .json file to be created in /files or /uploads.
  2. Use the async GET command below, where [filename] is the name of the file without the extension: $ curl -X GET my.service.ip:5000/async/[filename]


Obtaining synchronous transcripts

Use the /sync/upload route as follows to block until a file is both uploaded and returned:

$ curl -X POST my.service.ip:5000/sync/upload/[filetype] \
    -H "Content-type: multipart/form-data" \
    -F "file=@targetfile.ogg" \
    -F 'data={};type=application/json'

The optional [filetype] (see Output Types) can be set to json, transcript, text, or stm. If the transcription does not complete within the SYNC_TIMEOUT, a short error message will be returned.

Note: This route does not require persistent shared storage, and thus can be used in cases where there are no mounted volumes to the containers.

For larger files, it is sometimes convenient to upload and then retrieve the transcript. Files can be posted to the /upload route and then retrieved using the /file route once it has completed. Using Callback will allow you to notify downstream applications that the file is ready for retrieval.

$ curl -X POST my.service.ip:5000/upload \
    -H "Content-type: multipart/form-data" \
    -F "file=@targetfile.ogg" \
    -F 'data={};type=application/json'

$ curl -X GET localhost:5000/file/targetfile.json




SVTServer Versioning

We maintain tagged versions of all Scribe services with major, minor, and incremental version numbers x.y.z. The latest tag should always point to the most recent version, which is currently 3.5.3. Additional images also maintained include ``.

You can check the version of a running SVTServer instance through the version route:

$ curl -X GET localhost:5000/version
{
    "version": "3.2.3"
}

Presently, the following SVTServer versions are available on our docker repo: 3.5.2, 3.5.1, 3.5.0, 3.4.9, 3.4.8, 3.4.7, 3.4.6, 3.4.5, 3.4.4, 3.4.3, 3.4.2, 3.4.1, 3.4.0, 3.3.1, 3.2.8, 3.2.7, 3.1.4, 3.1.2, 3.1.1, 3.1.0, 3.0.9, 3.0.8, 3.0.7, 3.0.6, 3.0.5, 3.0.4, 3.0.3, 3.0.2, 3.0.1, 3.0.0, 2.1.4, 2.1.3, 2.1.2, 2.1.1, 2.1, 2.0.23