GreenKey

Configuring Scribe




Scribe quote capture server (SQCServer) can only be configured at the service level using parameters specified as environmental variables during the docker run command:

$ docker run --rm -d \
  ...
  -e PARAM-NAME:"PARAM-VALUE" \
  -e PARAM-NAME:"PARAM-VALUE" \
  ...
  docker.greenkeytech.com/sqcserver

These configuration parameters apply globally to a container, meaning that they will affect every job submitted to that service.




An overview of configuration options is shown below.

Available Configuration Options

Feature Global Configuration Default Value
Product Class PRODUCT_CLASS="egbs" None
Product Classes PRODUCT_CLASSES="egbs,energy,distillates,fxoptions" All allowed by license
Users Users="5" "1"
Price database PRICE_DATABASE="True" "False"
Data directory DATA_OUT="data-directory" None
Quote activation keyword QUOTE_ACTIVATION_KEYWORD="confirm" "confirm"
Quote deactivation keyword QUOTE_DEACTIVATION_KEYWORD="done" "done"
Silence timeout SILENCE_TIMEOUT=5 5
Chunk length CHUNK_LENGTH="0.125" "0.125"
Beam BEAM=15 12
Endpointing ENDPOINTING="True" "True"
Custom cleaner CUSTOM_CLEANER="cleaner-name" "default"
Clean transcript CLEAN_TRANSCRIPT="True" "True"
Model type MODEL_TYPE="general" "tradervoice"
Interpreters INTERPRETERS="True" Model-dependent
Discovery DISCOVERY="True" "False"
Word confusions WORD_CONFUSIONS="False" "False"
Word alignments WORD_ALIGNMENTS="False" "True"
N-best hypotheses N_BEST=2 1
Minimum segment length MIN_SEG_LEN=5 5




Explanation of Configuration Options

Feature Details
Product Class Accounts are enabled to access specific product classes based on license permissions.
Product Classes Defining a list of product classes here ensures that our product classifier predicts one of these product classes. This is useful if you do not trade all product classes we support and wish to avoid misclassified quotes.
Users This is the maximum number of websocket connections permitted for the container.
Price database By referring to a local price database, homonyms and order of magnitude mistakes in transcription can be fixed.
Data directory SQCServer saves raw audio and json files to this directory. Use docker run -v local-directory:data-directory to map this folder to a local directory.
Quote activation keyword This keyword forces interpretation of the following text as a quote or trade.
Quote deactivation keyword This keyword forcibly stops interpretation of text after an activation keyword.
Silence timeout This sets the silence timeout (in seconds) for the decoder.
Chunk length This parameter controls how often audio is sent to the decoder. A lower value will typically result in more rapid response times. Larger values may be necessary when complex postprocessing is used.
Beam High values increase the quality and computational cost of transcription. Typical values are 8-40.
Endpointing A value of True means that extended silence will be used to aggressively segment transcription.
Custom cleaner This controls which formatter is used to create the clean transcript in the json object.
Clean transcript This performs simple text substitution of numbers and returns a clean transcript in the json object.
Model type Determines whether general ("general") or trader voice ("tradervoice") transcription models are used.
Interpreters This activates quote interpreters. By default, this is True for our "tradervoice" model and False for our "general" English model.
Discovery This activates the Scribe Discovery Engine for finding and interpreting intents in the transcript in real time.
Word confusions This causes the word confusion lattice to be output to the json object returned by transcription.
Word alignments This causes the computation of start/stop times for individual words in the best transcript.
N-best hypotheses This controls how many candidate hypotheses are processed by SQCServer.
Minimum segment length This controls the minimum segment length (in seconds) output when endpointing is turned on.