GreenKey

Extending Scribe's Discovery Engine

Scribe's Discovery Engine interpreters are designed to be extended. If the tokens you need already exist, then making a new interpreter is as easy as defining what patterns you want to discover.

Once you define your interpreter, Scribe Discovery Engine handles the rest of the fuzzy logic to find your patterns in spite of transcription errors, provided that you use Discovery in conjunction with Scribe Transcription Engine.

If you are ready to start coding, try our quick-start example page to get experience writing intent and entity definitions that can be immediately used by Discovery

Uploading your interpreter to our service

You will need to create a directory that you can mount to Discovery on runtime.

The folder structure for the mount file should look like this

custom
│   intents.json
│   scheames.json (optional)
│
└───entities
    │   some_entity.py
    │   some_other_entity.py
    |   a_third_entity.py
    |   cleaning_functions.py (optional)

In the intents.json file, you will put all of your custom intent definitions. You can read about creating an intents.json here.

Put all of your entity definitions into separate files in the entities directory. For more information on creating your individual entity definitions, see Entities.

Once you've designed your interpreter, you can pass the interpreter to Scribe Discovery Engine at runtime.

$ docker run --rm -d \
  ...
  -v "$PWD/[custom-directory]":/custom \
  ...
  docker.greenkeytech.com/discovery

where custom-directory is a local directory containing your intents.json and your collection of custom entities.

Posting To Discovery

Manually Requesting An Intent

You then have two options for sending inquries to Discovery. If you specify "intents" in your POST request, Discovery will automatically search through your request file for any matching entities associated with any specified intents.

For example, if you specify a custom intent that is called directions in your intents.json, this can be used in our container as follows:

curl -X POST http://localhost:1234/discover \
     -H "Content-Type: application/json" \
     -d '{"intents": ["directions"], "transcript": "some transcript words here"}'

where input-file is a sample transcript file from GreenKey Scribe.

Automatically Determining Intent

If you decide to omit the key "intent", Discovery will automatically detect the best intent, and send you the associated entities that it discovers that are associated with that intent.

You can also determine the number of intents that it discovers, by providing the key "num_of_intents". If we wanted to automatically detect the top two intent candidates for file, a POST request would look like the following.

curl -X POST http://localhost:1234/discover \
     -H "Content-Type: application/json" \
     -d '{"num_of_intents":"2", "transcript": "some transcript words here"}'

Customizing Return JSON From Discovery

You may decide that there is a custom json object that you would like to have returned from Discovery. This can be accomplished by creating a schemas.json file. This file will be in the top level of your custom directory alongside your intents.json file.

The schemas.json file will have one key, "schemas", which will contain an array of objects. Each object should correlate to an intent you created in your intents.json file. The custom json will only be applied to your output if Discovery finds a matching intent for the custom json.

Here is an example:

schemas.json

{
  "schemas": [
    {
      "label": "graph_request",
      "return_json": {
        "graph_type": "{graph_type}",
        "time_duration": "{time_duration}"
      }
    }
  ]
}

The above schema contains one schema object.

The "label" property identifies which intent will receive the custom formatting. This property should be the same string used in your intents.json for which ever intent you are modifying. We can see that the schema object will be applied to any graph_request intents that are found.

The "return_json" specifies what will be returned from Discovery. We will use the same string interpolation syntax that was used in the intents.json file. Any strings containing a {entity} will be replaced with the most likely found entity of the same name.

The custom json properties that are provided will be returned as top level keys from Discovery. An example response for the above would look like the following.

{
  "graph_type": "t-spread",
  "time_duration": "10 weeks"
}

Array of Entities Returned

If you are looking for multiple entities that show up throughout a dictation, you can also specify an array to be returned. This is done by nesting an interpolated entity in an array, like so ["{corporate_bond}"]. If we add this to the example above, it would look like the following.

schemas.json

{
  "schemas": [
    {
      "label": "graph_request",
      "return_json": {
        "corporate_bonds": ["{corporate_bond}"],
        "graph_type": "{graph_type}",
        "time_duration": "{time_duration}"
      }
    }
  ]
}

A response from the following would look like the following.

{
  "corporate_bonds": ["Microsoft 24", "Apple 24"],
  "graph_type": "t-spread",
  "time_duration": "10 weeks"
}

Feel free to nest the custom json as deep as you need to.

schemas.json

{
  "schemas": [
    {
      "label": "graph_request",
      "return_json": {
        "top_nesting_key": {
          "second_nested_key": {
            "corporate_bonds": ["{corporate_bond}"]
          },
          "graph_type": "{graph_type}",
          "time_duration": "{time_duration}"
        }
      }
    }
  ]
}

Further assistance

For any other assistance on specialized interpreter design, please contact us.