Scribe Discovery Engine
Version 0.1 Documentation
An overview and best practices for using Scribe Discovery to find and interpret patterns in voice transcripts
Checkout the accompanying Discovery SDK
What Does Discovery Do?
If you're interested in building a voice command or searching for a particular pattern from transcription, you may assume you can work off of a transcript like:
Turn right and stop at nine west monroe street
In reality, noisy environments and dense language can reduce transcription accuracy. Our real-time and delayed transcription engines produce a large number of phonetic possibilities. Scribe Discovery searches this matrix to find your target phrases or voice commands even in inaccurate transcription. In the example above, Discovery could handle for transcription errors like these:
Turn right and stop at none west monroe street Turn rite and stop at none best monroe sweet
How Discovery Works
Discovery takes a transcription as input, and determines the "intent" of that transcript, and then return all the "entities" from the transcript that correspond with that intention back to the user.
Discovery works by first classifying strings into "intents." An "intent" is a command that gets triggered by a particular set of words. For example, imagine a user says, "Open Google Chrome." The intent of this sentence is for the computer to open an application.
The next step is to extract out "entities" from a sentence. An "entity" is an individual piece of a command statement that provides details about the command. In the previous example, we would say there is one entity, "Google Chrome". The command to "open" a program is meaningless without the detail of which program to open.
Example: Giving Directions
Sometimes entities are easy to determine.
Consider the command, "Turn right."
We can deduce that for the command "Turn" there will be only one entity, which will be the direction one is meant to turn.
For this entity, there are only two possible options, "right" and "left."
Interpreting a command like "Turn right" would therefore be relatively easy to code with a handful of
Now consider the command, "Turn right and stop at nine west monroe street" This command contains an address as a entity. Detecting an address is a much more difficult entity to manage. We can imagine it would be possible to do so successfully if the address was always presented correctly and in a predetermined format, however, with text transcribed from audio this is not always the case.
There is a chance that the prior command could have been transcribed as, "Turn right and stop at none west monroe street," because of the similarities in pronunciation of the words "nine" and "none". This is where the Discovery Engine becomes essential in parsing out entities of commands.
Scribe's Discovery Engine uses fuzzy logic to detect patterns in text, even on audio that has been incorrectly transcribed, so long as the audio was transcribed with the Scribe Voice-to-Text Service.
To get the most out of Discovery, you will need to know how to define your own intents, which is done with an
intents.json file. You will also need to know how to define the entities associated with your intents before Discovery can properly find and label them.
To learn how to make your own
intents.json file, click here.
To learn about how to make your own entities, click here.
If you want to try it out, read about how to quickly Deploy Discovery on your own machine to develop against it.
Our Discovery SDK gives you complete end-to-end examples to get you started building your own interpreter.
The Scribe Discovery engine is packaged within our real-time dictation service. If you would like to test off of real audio, check out the docs for SQCServer.