GreenKey

Intent Classification

Before you can use Discovery to extract "entities", you must know the "intent" of the commands that you are attempting to analyze.

Discovery requires a definition file to be provided when the container is first built so that it can accurately classify intents.

Creating an intents.json file

When you launch Discovery, you will need to mount a shared volume to the container. This volume is how you will provide Discovery with your own custom intents and entities.

The first thing you will need in this folder is a definition file called intents.json. This file is expected to be valid JSON. It should contain an object with a single key, "intents", which in turn contains an array of intent objects.

intents.json

{
  "intents": [
    // intent objects go here
  ]
}

Usually, an intent object is comprised of two keys, "label" and "examples". Here is an example of an intent object for the intent trade_history.

{
  "label": "trade_history",
  "domain": "skills",
  "examples": [
    "pull up my {number_of_trades} {trade_value_filter} for {corporate_bond}",
    "show me my {number_of_trades} for {corporate_bond}",
    "show the {number_of_trades} that were {trade_value_filter} for {corporate_bond}",
    "get our {number_of_trades} for {corporate_bond} that are {trade_value_filter}",
    "display {number_of_trades} of {corporate_bond} and are {trade_value_filter}"
  ]
}

Main Parameters

label

Each intent object has a "label" property that is given to it. This label will be returned from Discovery as the label for any matched content it discovers from the submitted lattice file.

examples

Each intent object also has a set of "examples". These are meant to give Discovery an idea of what the sentence structure looks like for given intent, so that Discovery can automatically detect an intent from a voice file.

The examples can also optionally be used to generate data for the scribe transcription engine. Scribe can learn to prefer certain specialized phrases and words that occur for a particular use case to further increase transcription accuracy.

It also informs Discovery which entities are associated with each intent.

The entities that are to be searched for should be used in the format {entity_name}, where "entity_name" would correspond with the name of an entity file entity_name.py. Discovery will automatically look up the definition for this entity, and train its model based on the entity definition found therein.

In the case of {trade_history}, in the example above, Discovery will expect there to be a coresponding file entities/trade_history.py located in your mounted drive that contains the entity definition for trade_history.

For more information on how to properly structure your mounted volume, click here.

To learn how to structure an entity definition file, click here,

Extra Parameters

entities

You may decide that you do not need Discovery to classify your intents for you. In this case, you can skip the training of intent classification that happens on the container startup.

To do this, simply omit the "examples" property. Instead, provide an "entities" property that contains a list of the entities that you would like Discovery to detect in a particular intent.

{
  "label": "trade_history",
  "entities": ["trade_history", "trade_value_filter", "corporate_bond"]
}

domains

Intents can also be grouped into "domains." This can be useful for aiding Discovery in intent classification for users that have many different intents. You can include the domain in a request to Discovery, which will limit Discovery to only search for intents that belong to that particular domain.

The following shows an intent configuration file with two different intents that belong to two different domains. In actual practice, you would not need to specify domain unless you had many more intents.

intents.json

{
  "intents": [
    {
      "label": "address",
      "domain": "directions",
      "examples": [
        "I live at {address}",
        "Turn right at {address}",
        "Please take me to {address}"
      ]
    },
    {
      "label": "license_plate",
      "domain": "safety",
      "examples": [
        "in pursuit of vehicle, license {license_plate}",
        "the plates are {license_plate}",
        "he is driving a red car {license_plate}"
      ]
    },
  ]
}
curl -X POST http://localhost:1234/discover \
     -H "Content-Type: application/json" \
     -d '{"retainLattice":"false", "domains":["directions"], "transcript": "drive me to fifty five west monroe"}'

Structure Enforcement




By default, Discovery will look for entities for a given intent regardless of what order they show up in the dictation file. If Discovery was set up to look for a person's name and a vehicle description, then it would perform equally well on the dictation "Bob drives a gray Honda", and "the gray Honda is driven by Bob".

There are some circumstances where the order that entities show up in a dictation does matter, however.

Coordinates Example

Imagine if someone was reading latitude and longitude coordinates. The coordinates for Chicago, Illinois are "41.88, -87.63". Someone might read this aloud as "forty one eighty eight negative eighty seven sixty three."

To detect latitude and longitude, we would create very similar entity definitions for both. According to how our example reader prefers to read the coordinates, the entity definition for latitude or longitude would basically be a series of numbers, optionally preceeded by a plus or minus sign. For this example, how would Discovery know which number patterns are lattitude, and which are longitude?

We solve this problem with "structure enforcement."

If a given intent uses structure enforcement, then an entity is only valid if it adheres to a given pattern in the intents configuration file. Let's construct an intent definition for the example above.

intents.json

{
  "intents": [
    {
      "label": "coordinates",
      "entity_patterns": [
         ["latitude", "longitude"]
      ],
      "structure_enforcement": "True",
      "skip_training":"True"
    },
  ]
}

By setting the "structure_enforcement" property to "True", Discovery now knows that the entities it detects must show up in a particular order. For structure enforcement to work, you must specify the valid arrangements of entities. In the example above, we specify an "entity_patterns" property indicating that latitude has to occur before longitude.

In the above dictation, "forty one eighty eight" will be classified as both a latitude and longitude, as will the second half of the transcript "negative eighty seven sixty three." The only classification for them that follows a valid entity pattern, however, is for "forty one eighty eight" to be classified as a latitude, and "negative eighty seven sixty three" to be classified as a longitude.

Structure Confidence

Discovery will give the found entities a score to indicate how well the entities adhered to any patterns provided in the intent definition. If the previous example was submitted to Discovery, we would get the following json returned.

{
  "intents": [
    {
      "label": "coordinates",
      "entities": [
        {
          "label": "latitude",
          "matches": [
            [
              {
                "value": "forty one eighty eight",
                "probability": 1.0,
                "start_time": 1.65,
                "structure_confidence": 1.0,
                "end_time": 4.32
              }
            ],
            [
              {
                "value": "negative eighty seven sixty three",
                "probability": 1.0,
                "start_time": 4.63,
                "structure_confidence": 0,
                "end_time": 7.14
              }
            ],
          ]
        },
        {
          "label": "longitude",
          "matches": [
            [
              {
                "value": "forty one eighty eight",
                "probability": 1.0,
                "start_time": 1.65,
                "structure_confidence": 0,
                "end_time": 4.32
              }
            ],
            [
              {
                "value": "negative eighty seven sixty three",
                "probability": 1.0,
                "start_time": 4.63,
                "structure_confidence": 1.0,
                "end_time": 7.14
              }
            ],
          ]
        }
      ],
      "probability": 1
    }
  ]
}

As you can see, the return structures for "latitude" and "longitude", are almost identical. The only difference is that the structure confidence is 100% for the values that match the given entity pattern, and is zero for those that do not.

If you do not want values with low structure confidence to be returned, you can set the threshold with an environment variable when you start your Discovery container. Set STRUCTURE_CONFIDENCE_THRESHOLD to a float between 0 and 1 to determine at what point entities are not relevant to your result set. This value is set to 0.01 by default.

Specifying Valid Entity Patterns

There are multiple ways that you can specify which entity patterns are valid.

If you are using the "examples" property in your intent definition, this can double as your definition for valid intent structure. Simply make sure that all the possible valid arrangements of entities are represented in your example sentences, and specify the "structure_enforcement" property as "True".

If you do not have an "examples" property, because you are not using Discovery to classify your intent, you can specify an "entity_patterns" property. This property is an array of arrays of entity strings. Each entity string must have a corresponding entity definition file of the same name. If you skip the "examples" property, remember to include "skip_training":"True" in your intent definition.