Improving Accuracy with ScribeTrain

Obtaining the highest accuracy transcriptions requires high audio quality and a knowledgeable model. Below are some of the factors that affect transcription accuracy and methods for improving them.

Audio Quality

The quality of the input audio file has a direct effect on transcription accuracy.

Methods for improving audio quality:

Voice Onboarding and training data generation

GreenKey has a voice onboarding process that you can participate in. Visit this page, then simply press start, and read the phrases shown (hitting next between each phrase). This page also allows you to upload your own text files. Each line of the file will be treated as a single prompt. At the end of onboarding, you will need to download both the .wav and .stm files.

Add these two files to your training data to improve accuracy for this voice and accent. The files are completely anonymized. Remember to convert all wav files to sph format for training.

New Data Collection

A browser tool for cleaning poorly transcribed audio is provided in the scribetrainSDK under sdk/correction. Add your files to the data folder, then open index.html in your browser and follow the instructions inside. Once you correct your transcript, the stm and wav files can be added to your training data to systematically improve accuracy. Remember to convert all wav files to sph format for training.

Language Model

The language model that Scribe uses will have an impact on the words and phrases recognized from an audio sample. If your audio contains words or phrases that Scribe's language model does not know (such as proper names), these will not transcribe accurately.

If you are using the full training process, the language model will automatically be adapted to your training data.

If you are using the extension process, you may want to adapt the language model after you have packaged the acoustic model. Follow the documentation on language model customization to add words or improve the language model.

Contact us for more information about extending the language model.

Tailoring model parameters using a target file

Several model parameters used in decoding may need to be tailored based on your target domain and model type. GreenKey offers a short tool called scan_params which repeatedly invokes SVTServer for a target file with a reference transcript. To use this tool, do the following:

Step 1 - Download and

Please contact your GreenKey Tech representative if this step is unclear.

Step 2 - Load local files

You will need a reference file for using this tool with a corrected transcript [transcript] and an audio file [audio_file]. Store both these files inside your working directory. After you prepare this data, you'll want to also copy your customized model to the custom directory in the working directory.

Step 3 - Launch the container

Acceptable modes - e2e and gmm - please consult ScribeTrain documentation. Note that NPROCS should be logical cores / 2 per SVT documentation

docker run \
  --rm \
  -it \
  -e MODE=e2e \
  -e AUDIO_FILE=[audio_file] \
  -e REFERENCE_TRANSCRIPT=[transcript] \
  -v $(pwd)/custom:/custom \
  -v $(pwd):/files \
  -v /var/run/docker.sock:/var/run/docker.sock \