GreenKey

SVTServer Features

Scribe's SVTServer has a wealth of features that provide additional metadata along with the text transcript.

Some of these features require additional provisioning in your license. Contact us if you wish to enable the Scribe Insight Engine


Audio-Related Features



Audio Quality Score

The quality of the input audio file has a direct effect on transcription accuracy. Scribe features an "audio quality score" that scores an audio file between 0.1 to 5, with 5 being the best. The audio quality score of a file can be obtained by enabling the feature through the Scribe Insight Engine.

A score of 2 or above indicates fair to good audio quality, but a high audio quality score does not guarantee high transcription accuracy. A score below 2, however, indicates that the poor audio quality of the file will negatively affect transcription accuracy.

Below is breakdown of the audio quality score and some of the factors that impact it. Directly recording audio from a microphone will result in the highest quality. Listening to a telephony stream instead of the microphone directly will reduce audio quality. The lowest audio quality is typically found in archival compressed audio files of telephony recordings.

Audio Quality Score Description Typical Sample Rate Typical Noise Level
5 (★★★★★) DVD Quality 48 kHz Low
4 (★★★★) CD Quality 44.1 kHz Low
3 (★★★) FM Radio 32 kHz Moderate
2 (★★) Telephony 16 kHz Moderate
1 (★) Recorded Telephony 13.2 kHz High
<1 Walkie-Talkie 8 kHz or less High

Poor audio quality will have a drastic impact on accuracy. To get a feel for this, below is a table that demonstrates the degradation of transcription accuracy with degraded audio quality. As you can see, a perfect transcription will devolve into one with only 80% accuracy with degraded audio quality:

Audio Quality Score WER Transcription Audio Sample
> 2 0% this is a test of green keys transcription engine
testing one two three four five
this is a test one two three
Listen
1 10% this is a test to screen keys transcription engine
testing one two three four five
this is a test one two three
Listen
< 1 20% this is a test to screen keys transcription engine
testing a one two three four five
this is attached to one two three
Listen


Methods for improving audio quality:

  1. Use a higher sample rate when recording audio (16kHz or higher recommended)
  2. Minimize the amount of noise in the file
  3. Normalize the volume / gain of the file
  4. Reduce volume such that clipping of the signal does not occur
  5. Keep separate speakers on separate channels of an audio file


Scoring an Audio File Without Transcription

You can obtain the audio quality score of a file before attempting transcription:

Via stored files

$ curl -X POST localhost:5000/audioscore \
    -H "Content-Type: application/json" \
    -d '{"file":"/files/test.wav"}'
{
  "audioScore": "2.68"
}

Via an uploaded file

$ curl -X POST localhost:5000/audioscore/upload \
    -H "Content-type: multipart/form-data" \
    -F "file=@test.wav"
{
  "audioScore": "2.68"
}


Scoring an Audio File During Transcription

You can also obtain the audio quality score of a file while transcribing a file.

Global configuration

$ docker run \
  ...
  -e AUDIO_QUALITY="True" \
  ...

Configure a single job Via stored files

$ curl \
  ...
  -d '{"file":"/files/myfile.wav","audioQuality":"True"}'

Via an uploaded file

$ curl \
  ...
  -F 'data={"audioQuality":"True"};type=application/json' \
  ...




Audio Normalization

Audio normalization can help improve accuracy for audio files where multiple channels with varying volumes have been compressed into a single stream.

SVTServer offers several configuration parameters to control audio normalization at the service level:

Normalization level

$ docker run \
  ...
  -e NORMALIZATION="none" \
  ...

Options are: none, low, and high. low employs a mild normalization based on the NOISE_THRESHOLD setting. high employs two-pass EBU R128 normalization.

Default value is low.

Noise Threshold

$ docker run \
  ...
  -e NOISE_THRESHOLD="500" \
  ...

The NOISE_THRESHOLD is the RMS signal level below which the engine regards a segment of audio as noise. This level is also used to determine the normalization degree for low normalization mode above. Reduce this value if you believe large segments of audio are not being transcribed at all.

Default value is 550.

Amplitude Cutoff

$ docker run \
  ...
  -e AMPLITUDE_CUTOFF="1.55" \
  ...

The AMPLITUDE_CUTOFF is a silence threshold used when determining if a signal should be regarded as silence or not. The value is a percent of the max signal. This value is used when calculating the audio score as well as when conducting speaker identification.

Default value is 1.55.




Speaker Separation

By default, Scribe assumes each channel of audio in a file is a separate speaker, and transcriptions are segmented based on a pre-defined segment length and natural conversation pauses.

When there are multiple speakers on a single channel of audio, speaker diarization can be used to segment the transcript at speaker changes.


Non-Speaker Boundaries

Each segment of a transcription lattice has a boundary field:

{
  ...
  "segments": [
    {
      "boundary": "phrase", 
  ...
}

By default, boundary will either be a phrase boundary (denoting the end of a phrase) or a pause boundary (denoting a short pause in speech). The duration of silence in pause boundaries is shorter than phrase boundaries.


Speaker Diarization

Note: Versions of SVTServer at or before 3.4.3 have a MULTISPEAKER mode which only conducts speaker boundary separation and does not cluster the resulting segments. This mode has been deprecated, and newer versions treat MULTISPEAKER the same as DIARIZE.

With Diarization Mode Scribe can separate and cluster speakers. Segments in the transcript will be given cluster labels: cluster_0, 1, 2, ..., cluster_MAX.


Controlling the number of clusters

A maximum number of speakers/clusters can be set with MAX_NUM_CLUST (or "maxNumClust" for submitted jobs) to limit the number of clusters that will be output. This will force an output of a specified number of clusters if the original output was greater than that specified number (for example 8 clusters will be reduced to 5 with MAX_NUM_CLUST="5").

A minimum number of speakers/clusters can be set with MIN_NUM_CLUST (or "minNumClust"). Scribe will cluster speaker segments and try to avoid a final cluster number less than the specified number (for example if 2 clusters are found, further consolidation into 1 will not occur when MIN_NUM_CLUST is set to "2"). Note that this parameter does not always force a specific number to be output: for some audio files two speakers can be clustered into one even with MIN_NUM_CLUST="2".

If the max and min configuration parameters are not specified, Scribe will use the default values of MAX_NUM_CLUST="5" and MIN_NUM_CLUST="2".


Speaker / cluster labels

If a speaker sample is provided (see Speaker Identification below), Scribe will identify a cluster from the speaker-separated segments that is a closest match to the speaker sample. If "speakerSample" is absent (or -F "sample=..." is absent for the /upload route) in the submission's data field, diarization without speaker identification will occur and the JSON speakerInfo "ID" key will contain the resulting cluster number (where cluster_0 is the 1st cluster).

"speakerInfo": {
  "confidence": "",
  "ID": "cluster_2"
}
"speakerInfo": {
  "confidence": "",
  "ID": "cluster_3"
}

Note that speaker ID confidence scores will only be displayed when the number of clusters is 2 and when a sample file is provided. Otherwise, confidence values in the speakerInfo key will be empty.


Enabling Diarization Mode

Speaker diarization via a single-use container:

$ docker run \
  ...
  -e DIARIZE="True" \
  -e MAX_NUM_CLUST="4" \
  -e MIN_NUM_CLUST="2" \
  -e TARGET_FILE="/files/test.wav" \
  ...

test.wav is the target file in the storage-directory relative to the container

When a container is launched and kept alive with DIARIZE="True", speaker diarization can be disabled for transcription jobs with "diarize":"False". Note that when diarization is enabled (through DIARIZE="True" or "diarize":"True") Multi-Speaker separation will be automatically disabled.


Speaker diarization via the /upload route:

$ curl -X POST localhost:5000/upload \
  -H "Content-type: multipart/form-data" \
  -F 'data={"diarize":"True", "maxNumClust":"4"};type=application/json' \
  -F "file=@test.wav"

Speaker diarization via stored files:

$ curl -X POST localhost:5000/ \
  -H "Content-type: application/json" \
  -d '{"file":"/files/test.wav","diarize":"True", "maxNumClust":"4"}'


Too many speakers

If the output has too many speakers after diarization, the number of speakers can be reduced by setting a limit for the maximum number of clusters with MAX_NUM_CLUST.


Setting the Maximum Segment Length

Speaker-separated segments can often be quite long in length. To set a maximum value (in seconds) of any segment length and thus allow for more pause boundaries, you can globally set this value:

$ docker run \
  ...
  -e MAX_SEGMENT_LENGTH="12" \
  ...

The default value is 15 seconds.


Identifying Speakers

If you have an audio file with 2 or more speakers and a speaker sample from one of the speakers, you can run speaker identification. This process takes a 5 - 7 second speaker sample and uses it to identify a cluster, from speaker-separated diarization segments, that matches the speaker sample. The sample file should have only 1 audio channel.

The resulting segments are identified as "sample" or "other" (a confidence score is output only if there are 2 clusters):

{
  ...
  "segments": [
    {
      "speakerInfo": {
        "confidence": 0.29, 
        "ID": "other"
      },
      ...
      "boundary": "speaker",
      ...
    },
    {
      "speakerInfo": {
        "confidence": 0.29, 
        "ID": "sample"
      },
      ...
      "boundary": "speaker",
      ...
    }, 
  ...
}


Speaker Identification for Single-Use Containers

$ docker run \
  ...
  -e DIARIZE="True" \
  -e TARGET_FILE="/files/test.wav" \
  -e SPKR_SAMPLE="/files/speaker_sample.wav" \
  ...

SPKR_SAMPLE is the full path of the speaker sample in the storage-directory relative to the container.


Speaker Identification for Submitted Jobs

Via the /upload route:

$ curl -X POST localhost:5000/upload \
  -H "Content-type: multipart/form-data" \
  -F 'data={"diarize":"True"};type=application/json' \
  -F "file=@test.wav" \
  -F "sample=@sample.wav"

sample.wav is the local path of the speaker sample file


Via stored files:

$ curl -X POST localhost:5000/ \
  -H "Content-type: application/json" \
  -d '{"diarize":"True","file":"/files/test.wav","speakerSample":"/files/speaker_sample.wav"}'

speakerSample is the full path of the speaker sample in the storage-directory relative to the container.




Decoding Speed

In some cases, slower decoding leads to higher accuracy by about a 10% reduction in word error rate. To enable this globally, add the following parameter:

docker run \
  ...
  DECODE_MODE="slow" \
  ...

If transcription speed is a concern, you can increase transcription speed by up to 2x with about a 10% increase in word error rate using the fast decode mode:

docker run \
  ...
  DECODE_MODE="fast" \
  ...




N-best Hypotheses

Enabling n-best hypotheses exposes multiple hypotheses that the engine found, ranked in order of overall confidence. If you are using Scribe to locate particular keywords with high accuracy, n-best hypotheses can expose these and help increase your keyword-spotting accuracy.

The 1-best transcript will always be the primary transcript and word lattice of the segment. Once enabled, a new key called alternatives will be present in each segment with the n-best hypotheses:

...
"alternatives": [
  {
    "n-best": 2, 
    "transcript": "yeah okay", 
    "words": [
  ...


Enabling n-best hypotheses

You can globally enable n-best hypotheses for all transcription jobs:

$ docker run \
  ...
  -e N-BEST="2" \
  ...


You can also enable n-best hypotheses when you submit a job:

Via the /upload route:

$ curl -X POST localhost:5000/upload \
  -H "Content-type: multipart/form-data" \
  -F 'data={"n-best":"2"};type=application/json' \
  -F "file=@test.wav"

Via stored files:

$ curl -X POST localhost:5000/ \
  -H "Content-type: application/json" \
  -d '{"file":"/files/test.wav","n-best":"2"}'


Word Confusions

In some cases, the entire word confusion matrix may be desired over only the n-best hypotheses. You can obtain the word confusion matrix (regardless of your n-best setting) by using the following parameters:

Global Configuration

$ docker run \
  ...
  -e WORD_CONFUSIONS="True"
  ...

Via the /upload route:

$ curl -X POST localhost:5000/upload \
  -H "Content-type: multipart/form-data" \
  -F 'data={"wordConfusions":"True"};type=application/json' \
  -F "file=@test.wav"

Via stored files:

$ curl -X POST localhost:5000/ \
  -H "Content-type: application/json" \
  -d '{"file":"/files/test.wav","wordConfusions":"True"}'

The resulting transcript will contain a new key in each segment as shown below. The result is a list that represents the sequence of words in a "sausage statistics" format. The dictionaries that make up the list are of the format word : confidence level. The <eps> tag indicates no word at that confidence level.

Note that the word confusions will not process for segments that leverage cloud transcription.

Example Output:

...
    "transcript": "testing testing one two three", 
   "word_confusions": [
        {
          "okay": "0.0004070607", 
          "oh": "0.0009883753", 
          "ah": "0.0003431078", 
          "yeah": "0.0009430423", 
          "<eps>": "0.9964434", 
          "hm": "0.0003944826", 
          "he": "0.0004805192"
        }, 
        {
          "interesting": "0.001100707", 
          "trusting": "0.001839634", 
          "toasting": "0.0003385631", 
          "testing": "0.9956662", 
          "dusting": "0.001054899"
        }, 
...


Cloud Mode

If you are using Scribe in an environment with access to the internet, you can enable "CLOUD_MODE" to have low confidence segments transcribed on the cloud for higher accuracy.

Our cloud engine leverages other providers such as Microsoft Bing, IBM Watson, and Google's Speech API along with our own models to obtain the highest possible transcription accuracy.

docker run \
  ...
  ENABLE_CLOUD="True" \
  ...


When Cloud Mode is enabled, a new key will appear in certain segments called align_uncertainty. The Alignment Uncertainty is a measurement of how close the initial transcript and cloud transcript were. A high alignment uncertainty indicates that the cloud significantly influenced the output transcription. If the key is not present, then the cloud was not leveraged for that segment:

...
"segments": [
  {
    "align_uncertainty": 0.39, 
    ...


Two new keys called cloud_influence and cloud_time are also added:

{
  "cloud_influence": 0.14, 
  "cloud_time": 40.0, 
  "duration": 43.0, 
  ...

cloud_influence aggregates the alignemnt uncertanties of relevant segments to calculate the total influence of cloud transcription on the result. Typically, cloud influence increases for low quality audio as confidence scores drop.

cloud_time aggregates the total duration of audio sent to the cloud. Note that a file may be sent to the cloud, but the resulting transcript may not be chosen as the best.




Gender Classification

Scribe can predict a speaker's gender based on their voice characteristics. Note: when clustering two speakers, all segments of a cluster will be used to predict gender for that cluster. Longer duration segments (> 10 seconds) tend to give better gender classification accuracy.

To enable gender classification, use either of the following configurations:

Global configuration

$ docker run \
  ...
  -e GENDER_CLASSIFY="True" \
  ...

Configure a single job Via stored files

$ curl \
  ...
  -d '{"file":"/files/myfile.wav","genderClassify":"True"}'

Via an uploaded file

$ curl \
  ...
  -F 'data={"genderClassify":"True"};type=application/json' \
  ...

Resulting lattices will now have a genderInfo field for each segment:

{
  ...
  "segments": [
    {
      "genderInfo": {
            "gender": "male",
            "confidence": 0.97
        }, 
      ...
      "transcript": "one two three four five", 
      ...
    }, 
  ...




Spoken Language Identification

Scribe can identify spoken languages using audio signal alone. There are currently two supported languages: English and Spanish (other languages will be classified as "Other" or "Default"). Segments that are labeled as "English" or "Spanish" will be transcribed by appropriate engines. To force transcription of a segment labeled "Other", use the -e DEFAULT_LANG global configuration parameter (e.g. -e DEFAULT_LANG="Spanish" or "English"). When a configuration is not provided, the default language parameter is set to "English". When DEFAULT_LANG is provided, segments that are labeled "Other" will be called "Default" in the lattice. To avoid transcription of segments labeled "Other", set -e DEFAULT_LANG="None". If an audio segment is less than 2.5s in duration, language identification will not occur and the label will be automatically set to "Unknown". Segments under 2.5s will still be transcribed when DEFAULT_LANG is set.

To enable spoken language identification, use the following configuration:

Global configuration

$ docker run \
  ...
  -e LANG_ID="True" \
  -e DEFAULT_LANG="English" \
  ...

Resulting lattices will now have a langInfo field for each segment:

{
  ...
  "segments": [
    {
      "langInfo": {
            "ID": "English",
            "confidence": 0.97
        }, 
      ...
      "transcript": "one two three four five", 
      ...
    }, 
  ...

The global configuration keeps language identification "on" as long as the SVTServer is running. To avoid language identification for a single job, use {"langId":"False"}.

Note that segments which are identified as "Spanish" will have an approximate word-alignment, where each word's duration is proportional to its length in characters. Each word in the "Spanish" word-alignment lattice will have a confidence value that's equal to the segment transcript confidence. "English" segments will have a complete word-alignment and "Unknown" segments will have a blank word-alignment. "Spanish" segments also will have a formatted_transcript field in the lattice where the first character of the transcript is capitalized and a period is added after the final word.




Text-Related Features



Transcript Formatting

Scribe can add in punctuation and capitalization to transcribed segments for a more human-readable transcript.

To enable the default transcript formatter, add the following configuration to the container launch:

$ docker run \
  ...
  -e TRANSCRIPT_FORMATTER="default" \
  ...

Resulting lattices will now have a formatted_transcript field for each segment:

{
  ...
  "segments": [
    {
      "formatted_transcript": "1 2 3 4 5.", 
      ...
      "transcript": "one two three four five", 
      ...
    }, 
  ...


Using a Custom Formatter

In some cases, GreenKey may have provided you with a custom formatter (for example, my_custom_formatter.so).

You can use a custom formatter by adding it to a local directory called custom and mounting that directory to the container:

$ docker run \
  ...
  -v $(pwd)/custom:/custom \
  -e TRANSCRIPT_FORMATTER="my_custom_formatter" \
  ...

Filtering words from transcripts

Highly-charged and sensitive words and phrases can be removed from GreenKey transcripts by adding a list of words to wordfilter.txt in a local directory called custom and mounting that directory to the container. These words can be removed from the formatted transcript (option formatted) or from all GreenKey output (option all).

$ docker run \
  ...
  -v $(pwd)/custom:/custom \
  -e WORD_FILTER="formatted" \
  ...

Alternatively, at runtime this option can be applied if wordfilter.txt was present at container launch:

$ curl \
  ...
  -F 'data={"wordFilter":"all"};type=application/json' \
  ...

An example wordfilter.txt list may be obtained here (contains offensive words and phrases) or formatted as follows, with each term on a line.

competing company
password
bullfeathers




Insight Engine

Sentiment, Key Terms and Key Phrases

Scribe's Insight Engine scores segments on sentiment and subjectivity, as well as extracts out key terms that describe the overall topic of the conversation. Key terms also identify key phrases from segments of the transcribed conversation, providing more information and context about the topic. To obtain insights, contact us to make sure you have Insight Engine permissions enabled.

Each segment in the transcription lattice will now contain sentiment scores, like the example below. A sentiment of 1.0 represents very positive sentiment, while -1.0 represents negative sentiment. Subjectivity represents the "uncertainty" in the sentiment on a scale of 0 to 1:

{
...
  "segments": [
    {
      "startTimeSec": 0.0,
      "sentiment": {
        "polarity": 1.0,
        "subjectivity": 1.0
      },
      "words": [
        {
          "start": 0.0,
          "length": 0.31,
          "word": "lunch",
          "confidence": 0.99
        },
        {
          "start": 0.31,
          "length": 0.12,
          "word": "was",
          "confidence": 1.0
        },
        {
          "start": 0.43,
          "length": 0.25,
          "word": "delicious",
          "confidence": 1.0
        }
      ],
      "endTimeSec": 2.54,
      "boundary": "phrase",
      "transcript": "lunch was delicious"
    },
    ...


The transcription lattice (different example than above) will also contain a new key called "insights" at the end of the lattice with average sentiment, key terms and key phrases:

"insights": {
    "keyPhrases": [
      "an already strong first quarter last year we had very balanced",
      "had more than ten percent growth from product managers",
      "we had a very strong first quarter",
      ...
    ]
    "averageSentiment": {
      "polarity": 0.07,
      "subjectivity": 0.17
    },
    "keyTerms": [
      "percent",
      "year",
      "business",
      "question",
      "quarter"
    ]
  }


Global configuration

To enable the Insight Engine for all files:

$ docker run \
  ...
  -e FIND_INSIGHTS="True" \
  ...

You can also configure the number of key terms the Insight Engine should search for:

$ docker run \
  ...
  -e NUM_KEYTERMS="10" \
  ...

The default value is 5.

You can also configure the number of key phrases the Insight Engine should keep:

$ docker run \
  ...
  -e NUM_KEYPHRASES="15" \
  ...

The default value is proportional to the duration of the audio file.


Configure a single job

Via stored files

$ curl \
  ...
  -d '{"file":"/files/myfile.wav","findInsights":"True"}'

Via an uploaded file

$ curl \
  ...
  -F 'data={"findInsights":"True"};type=application/json' \
  ...


Insight Processing Without Transcription

Using an existing transcript json file to get insights is possible with the /insights route.

First, a segments list from the json needs to be prepared:

segs=$(python -c "import json; j=json.load(open('exampleFile.json','r')); out= {'segments':j['segments']}; print json.dumps(out)")

Next, POST the segments to Scribe using the insights route:

curl -X POST http://localhost:5000/insights \
  -H "Content-type: application/json" \
  -d "$segs"

{
  "keyPhrases": [
    "an already strong first quarter last year we had very balanced",
    "had more than ten percent growth from product managers",
    "we had a very strong first quarter",
    ...
  ]
  "averageSentiment": {
    "polarity": 0.07,
    "subjectivity": 0.17
  },
  "keyTerms": [
    "percent",
    "year",
    "business",
    "question",
    "quarter"
  ]
} 

Segments from multiple json files can be combined and sent to Scribe:

segs=$(python -c "exec(\"import json\\nsegsTot=[]\\nfor file in ['exampleFile1.json','exampleFile2.json']:j = json.load(open(file,'r')); segsTot = segsTot + j['segments']\\nprint json.dumps({'segments': segsTot})\")")
curl -X POST http://localhost:5000/insights \
  -H "Content-type: application/json" \
  -d "$segs"

Search terms/words can also be specified. Scribe will look for an occurence of a term in all segment transcripts (and alternate hypothesis transcripts, if present) and output a list of transcript lines where each word was found.

Search terms should be specified in this format:

segs=$(python -c "import json; j=json.load(open('exampleFile.json','r')); out= {'segments':j['segments'], 'searchTerms':['wordone','balanced']}; print json.dumps(out)")

A POST result will output a list of relevant transcript lines:

curl -X POST http://localhost:5000/insights \
  -H "Content-type: application/json" \
  -d "$segs"

{
  "keyPhrases": [
    "an already strong first quarter last year we had very balanced",
    "had more than ten percent growth from product managers",
    "we had a very strong first quarter",
    ...
  ]
  "averageSentiment": {
    "polarity": 0.07,
    "subjectivity": 0.17
  },
  "keyTerms": [
    "percent",
    "year",
    "business",
    "question",
    "quarter"
  ]
  "searchTermResults": [
    [
      "an already strong first quarter last year we had very balanced", 
      0.0, 
      3.2, 
      "1-best"
    ], 
    [
      "an already strong first quart last year we had very balanced", 
      0.0, 
      3.2, 
      "2-best"
    ]
  ]
} 

where each entry in the searchTermResults list contains the transcript line, start time (seconds, if present), end time, and N-best hypothesis number (if present in the input transcript segments). Note that start times (and end times) of json segments from files 2, 3 and onward will be modified so that they continue chronologically after the previous json segments. The segments will be treated as if they are from one audio file with a start time of 0.0s

Any text can be provided to the insight route, it does not need to originate from an existing json transcript file.

The proper format to POST text to the Scribe Insight Engine is:

curl -X POST http://localhost:5000/insights \
  -H "Content-type: application/json" \
  -d '{"segments": [{"transcript": "testing one two three three three trees and one apple"}]}'

{
  "averageSentiment": {
    "polarity": 0.0, 
    "subjectivity": 0.0
  }, 
  "keyPhrases": [
    "one two three three three trees and one apple"
  ], 
  "keyTerms": [
    "tree", 
    "apple"
  ], 
  "searchTermResults": []
}

Note that a transcripts audio duration (in seconds) can be specified in the POST. This will enable a limit on the number of keyPhrases output by Scribe, which can be useful for long duration audio file transcripts.

curl -X POST http://localhost:5000/insights \
  -H "Content-type: application/json" \
  -d '{"segments": [{"transcript": "testing one two three three three trees and one apple",}], "duration":3600}'



Financial Interpreters

Scribe's Financial Interpreters automatically identify quotes and trades from various product classes as they appear in a transcript.

Each quote / trade is then parsed into an Instrument String that contains all of the information pertaining to it in a universally-recognized, condensed form.

Scribe currently has Interpreters for the following product classes:

- European Government Bonds   
- Dollar Swaps  
- Euro Swaps  
- Energy Options  
- FX Options  
- FX Forwards  
- Swaptions  
- Energy Middle Distillates  
- Softs  
- Corporate Bonds  




Interpreter Output

When quotes and trades are detected, they will appear at the end of a lattice with the following format:

{
  ...
  "identified_quotes": [
    {
      "imString": "BKO 12/18 -- / .645", 
      "startTimeSec": 8.81, 
      "transcript": "if i said quote like dec eighteen schatz sixty four and a half offered i should see it at the end of this object", 
      "product_class": "egbs"
    }
  ]
  ...

If the replace quotes function is also enabled, the formatted_transcript item of any segment with an identified quote will be replaced with the imString of the identified quote.




Enabling Interpreters

Global configuration

docker run \
  ...
  -e IDENTIFY_QUOTES="True" \
  ...

To also replace quotes in formatted_transcript:

docker run \
  ...
  -e REPLACE_QUOTES="True" \
  ...


Configure a single job

Via stored files

$ curl \
  ...
  -d '{"file":"/files/myfile.wav","identifyQuotes":"True","replaceQuotes":"True"}'

Via an uploaded file

$ curl \
  ...
  -F 'data={"identifyQuotes":"True","replaceQuotes":"True"};type=application/json' \
  ...




Suppressing identified quote fragments

Occasionally, the imString will be very incomplete, possibly due to an incorrectly identified quote or a transcription error. By default, these quotes are not replaced. If you want to force them to be replaced with partial imStrings, a strict quotes flag must be set to false.


Global configuration

docker run \
  ...
  -e STRICT_QUOTES="False" \
  ...


Configure a single job

Via stored files

$ curl \
  ...
  -d '{"file":"/files/myfile.wav","identifyQuotes":"True","replaceQuotes":"True","strictQuotes":"False"}'

Via an uploaded file

$ curl \
  ...
  -F 'data={"identifyQuotes":"True","replaceQuotes":"True","strictQuotes":"False"};type=application/json' \
  ...




Discovery Engine

Custom NLP for Key Phrase Extraction

You can configure SVTServer to launch the Scribe Discovery Engine automatically by setting -e DISCOVERY="True" in the docker run command. Custom intents and entities for use in Discovery can be mounted using -v $(pwd)/[custom-directory]:/custom, as illustrated below.

$ docker run --rm -d \
  --name="svtserver" \
  ...
  -e DISCOVERY="True" \
  -v $(pwd)/[custom-directory]:/custom \
  ...
  docker.greenkeytech.com/svtserver

Note that this must be active at docker run time to take effect. However, Discovery can be turned off for individual jobs as follows:

$ curl -X POST localhost:5000/upload \
  -H "Content-type: multipart/form-data" \
  -F 'data={"discovery": "False"}; type=application/json' \
  -F "file=@test.wav"