GreenKey

GreenKey ASRToolkit

The GreenKey ASRToolkit provides basic examples and tools used to benchmark Scribe on your data. All of the commands below can be run using Python 3 or later.

To make evaluation easier, GreenKey provides an 'ASRToolkit' Docker image that contains all of the necessary packages to evaluate Scribe or any other transcription engine. You can use the commands below within the toolkit by obtaining and running it as follows:

docker run --rm -it -v $(pwd):/files -w /files --net host \
   greenkeytech/asrtoolkit:latest

The asrtoolkit container automatically mounts your present working directory and starts within that directory. All commands below reference directories that are present within the container. If you do not want to use the container, you can download the ASRToolkit and the sample audio files and adjust the paths below to local paths.




Scoring audio quality of a file

Save the following file as score.py. Set the url to be the HTTP path to your transcription service with no trailing slash:

import requests
import sys

url = 'http://localhost:5000'
route = '/audioscore/upload'
file = sys.argv[1]

r = requests.post(
      url + route, 
      files={'file': open(file,'rb')}
    )
print(r.text)


Launch SVTserver, then run the following command in the asrtoolkit (or in your shell). You should see the same resulting output:

$ python3 score.py /audio/quotes.wav
{
  "audioScore": "3.46"
}



Transcribing an audio file (synchronously)

Synchronous transcription is a great way to use Scribe if you have small files (i.e. a couple of minutes in length).

Save the following file as transcribe.py. Set the url to be the HTTP path to your transcription service with no trailing slash:

import requests
import sys

url = 'http://localhost:5000'
route = '/sync/upload'
file = sys.argv[1]

r = requests.post(
      url + route, 
      files={'file': open(file,'rb')}, 
      data={'type': 'application/json','data': ''}
    )
print(r.text)


Launch SVTserver, then run the following command in the asrtoolkit (or in your shell). You should see the same resulting output:

$ python3 transcribe.py /audio/quotes.wav
{
  "duration": 19.13, 
  "progress": 100.0, 
  "segments": [
    {
      "boundary": "phrase", 
      "confidence": 0.86, 
      "endTimeSec": 6.13, 
      "startTimeSec": 0.0, 
      "transcript": "october december twenty a quarter to twenty one", 
      "words": [
...



Transcribing an audio file (asynchronously)

Asynchronous transcription is a great way to use Scribe if you have large files.

Save the following file as transcribe.py. Set the url to be the HTTP path to your transcription service with no trailing slash:

import requests
import sys
import json
import time

url = 'http://localhost:5000'
route = '/upload'
file = sys.argv[1]

r = requests.post(
      url + route, 
      files={'file': open(file,'rb')}, 
      data={'type': 'application/json','data': ''}
    )

# Block until the transcript is ready
status = json.loads(requests.get(url + "/status").text)['status']

while status != 0:
  time.sleep(1)
  status = json.loads(requests.get(url + "/status").text)['status']

# Get the final transcription
r = requests.get(url + "/")
print(r.text)


Launch SVTServer, then run the following command in the asrtoolkit (or in your shell). You should see the same resulting output:

$ python3 transcribe.py /audio/quotes.wav
{
  "duration": 19.13, 
  "progress": 100.0, 
  "segments": [
    {
      "boundary": "phrase", 
      "confidence": 0.86, 
      "endTimeSec": 6.13, 
      "startTimeSec": 0.0, 
      "transcript": "october december twenty a quarter to twenty one", 
      "words": [
...



Configuring your transcription request

In either synchronous or asynchronous transcription, you can configure the transcription request data object as shown below. Ensure that the data object is passed as a string and not as a dictionary:

r = requests.post(
      url + route, 
      files={'file': open(file,'rb')}, 
      data={
        'type': 'application/json',
        'data': '{"multiSpeaker":"True","findInsights":"True"}'
      }
    )



Converting a lattice to text

By default, Scribe outputs JSON lattices. These lattices contain all of the data about the transcript. If you want to convert this JSON into a block of text, you can do so as shown below.

First, save your transcription as a JSON lattice:

$ python3 transcribe.py /audio/quotes.wav > quotes.json

From here, you can convert in into a text file. After running this command, you'll have a new file called quotes.txt:

$ convert_transcript quotes.json quotes.txt



Cleaning transcripts and calculating word error rate

The asrtoolkit comes with the asr-evaluation package that allows for easy calculation of Word Error Rate (WER).

The WER is the sum of word insertions, deletions, and substitutions divided by the sum of words. If many insertions are present, WER can be over 100%. Most often, accuracy is determined by subtracting the WER from 100%. Thus, a lower WER is desirable.

If you are not using the asrtoolkit, you can install the asrtoolkit package as follows:

$ pip install asrtoolkit

As an example, let's calculate the word error rate of the quotes.wav file. After obtaining the text transcript as shown above, save the following file as quotes-truth.txt:

October December twenty and a quarter to twenty one. Two year money fifty seven fifty seven and a half. Brent jan seventeen fifty five call.

It's important for the WER to be calculated on unformatted text. Our wer tool automatically performs this, but many terms are ambiguous and can be said different ways. To see how we clean formatted text, you can test this out on your file as follows:

$ clean_formatting quotes-truth.txt
File output: quotes-truth_cleaned.txt
$ cat quotes-truth_cleaned.txt
october december twenty and a quarter to twenty one two year money fifty seven fifty seven and a half brent jan seventeen fifty five call

Now calculate the word error rate with wer.

$ wer quotes-truth_cleaned.txt quotes.txt
WER: 0.000%

Here, we see the Word Error Rate (WER) reported in the next line.



Experimenting with Scribe

The sample audio files provides two sample files to try out: a file with quotes and trades and a sample from a Netflix earnings call. The earnings call, in particular, is a great way to test out some of Scribe's features that require longer conversations, such as insights and summarization.


Test decoding mode

Scribe also has a 'test' decoding mode that can be used to test the software without actually conducting transcription. By setting DECODE_MODE="test" on container launch, you can test Scribe's pipeline with resulting transcriptions that only say "test". This may be useful for conducting experiments around scalability, connectivity, file permissions, and so forth.


All metadata

To have all metadata exposed in your transcriptions, use the following set of environmental variables to launch your container:

docker run -d \
    ...
    -e DECODE_MODE="slow" \
    -e DIARIZE="True" \
    -e GENDER_CLASSIFY="True" \
    -e N-BEST="2" \
    -e AUDIO_QUALITY="True" \
    -e FIND_INSIGHTS="True" \
    -e IDENTIFY_QUOTES="True" \
    docker.greenkeytech.com/svtserver



Batch Transcription

You can submit batch transcription jobs to a single Scribe instance or a multi-instance Scribe service. Download our batch process script and follow the instructions below:

$ export SCRIBE_ENDPOINT="[scribe-endpoint]"
$ export MAX_TIMEOUT="[max-timeout]"
$ ls [target-dir]/*.[file-extension] | parallel -j [num-jobs] ./batch_process.sh

[scribe-endpoint] is the path to your Scribe service (e.g. http://localhost:5000) with no trailing slash.

[max-timeout] is the maximum time allotted for a transcription job to execute. If a post takes longer than this, it is assumed that the job has hung, and the script re-attempts the transcription job.

[target-dir] is the directory containing the files you wish to transcribe.

[file-extension] is the extension shared by all of your files. If your folder only contains audio files of various extensions, you can just use * instead of *.[file-extension].

[num-jobs] is the number of parallel jobs you wish to execute to process your files. Typically, this should at maximum be the number of containers behind your service or 1 - 2 above that value. If you have a Kubernetes service with 5 replicas of SVTServer, more than 7 batch jobs will not gain you any efficiency.

Note that parallel is a GNU utility which can be readily installed using any package manager.



Generating a Language Model Corpus

You can generate language model corpora from any text. Here are some examples for generating a corpus from a set of text files.

Both examples assume you are inside the asrtoolkit container or have it installed.

Generating text from a set of Excel workbooks

From within the container shell in the working directory, launch this command and point it to a folder containing your Excel workbooks (folder_containing_files in the example):

$ extract_excel_spreadsheets --input-folder folder_containing_files

You should now have a corpus directory with unformatted text in your working directory. Mount this as shown in the language model customization section under "Transcribing"

Generating text from a set of text files

From within the container shell in the working directory, navigate to the folder containing your text files:

$ clean_formatting *.txt 
$ for text_file in *_cleaned.txt; do mv $text_file ${text_file//_cleaned}; done

You should now have a corpus directory with unformatted text in your working directory. Mount this as shown in the language model customization section under "Transcribing"