Custom Entity

Overview

Jibber AI currently supports 20 different types of entities depending on the language. But you may want to add your own.

You can supply training data in one easy to read file and Jibber AI will automatically train a new entity model on startup from that data ready to start extracting entities from text.

NOTE: This feature is only available with the Docker solution and is not available with Rapid API.

Example Training File

In the custom_entity.json file below, you will find an example of a training file:

custom_entity.json
{
  "entities": [
    "DRUG",
    "CONDITION"
  ],
  "nerPosition": "before",
  "training": [
    "<DRUG>Abilify</DRUG> is a medication that is used to treat <CONDITION>schizophrenia</CONDITION> and <CONDITION>bipolar disorder</CONDITION>.",
    "I take <DRUG>Bupropion</DRUG> to manage my <CONDITION>depression</CONDITION>.",
    "<DRUG>Carvedilol</DRUG> is a medication that is used to treat <CONDITION>heart failure</CONDITION> and <CONDITION>high blood pressure</CONDITION>.",
    "I use <DRUG>Dexedrine</DRUG> to manage my <CONDITION>attention deficit hyperactivity disorder (ADHD)</CONDITION>.",
    "<DRUG>Eliquis</DRUG> is a medication that is used to prevent <CONDITION>blood clots</CONDITION>.",
    "The doctor prescribed <DRUG>Fluoxetine</DRUG> to manage my <CONDITION>depression</CONDITION> and <CONDITION>anxiety</CONDITION>.",
    "<DRUG>Gabapentin</DRUG> is a medication that is used to treat <CONDITION>nerve pain</CONDITION> and <CONDITION>seizures</CONDITION>.",
    "I take <DRUG>Hydrocodone</DRUG> to manage my <CONDITION>pain</CONDITION>.",
    "<DRUG>Insulin glargine</DRUG> is a long-acting insulin that is used to manage <CONDITION>diabetes</CONDITION>.",
    "<DRUG>Jardiamet</DRUG> is a medication that is used to manage <CONDITION>type 2 diabetes</CONDITION>.",
    "The doctor advised some time off to manage the <CONDITION>stress</CONDITION>."
  ],
  "validation": [
    "The doctor prescribed <DRUG>Effexor</DRUG> to manage my <CONDITION>depression</CONDITION> and <CONDITION>anxiety</CONDITION>.",
    "<DRUG>Famotidine</DRUG> is a medication that is used to manage <CONDITION>acid reflux</CONDITION> and <CONDITION>stomach ulcers</CONDITION>.",
    "I take <DRUG>Gilenya</DRUG> to manage <CONDITION>multiple sclerosis</CONDITION>.",
    "<DRUG>Hydrochlorothiazide</DRUG> is a medication that is used to manage <CONDITION>high blood pressure</CONDITION>.",
    "The doctor prescribed <DRUG>Invega</DRUG> to manage my <CONDITION>schizophrenia</CONDITION>.",
  ]
}

JSON File Properties

The JSON file must conform to the following rules:

  • It must be a valid JSON file
  • The top level must be an object - enclosed in {} braces
  • The nerPosition setting must be one of before, after or replace (explained below)
  • The entities section should contain a list of entities you are extracting from the examples
  • Each entity in the examples should be enclosed in html style start/end tags containing the entity name

The table below describes the various settings:

entities
This is a list of entity names that Jibber AI will look for in the training and validation data examples.
nerPosition

Jibber AI can run entity extraction to extract standard entities and also custom entities, but you can control this behaviour using the nerPosition setting:

  • before - Extract the custom entities before extracting the standard entities. For example, you train a model that extracts INVENTOR, but some of those names would also match PEOPLE entities. If the custom entity analyer runs first and finds 'Thomas Edison', then the standard entity analyzer will not return 'Thomas Edison' as a PERSON entity.
  • after - Extract the custom entities after extracting the standard entities. For example, you train a model that extracts INVENTOR, but some of those names would also match PEOPLE entities. If the standard entity analyzer runs before the custom entity analyzer and finds 'Thomas Edison' as PERSON, then the custom entity analyzer will not return 'Thomas Edison' as an INVENTOR entity.
  • replace - Extract the custom entities only. The standard entity analyzer will not run against the text data. CAUTION: If you are using the replace option and also including PII data in the response, then you will not receive PII data for GENERAL_PERSON as that uses the PERSON/PER entities.
training

Contains a list of examples. Each entity should be enclosed in the html style tags with the tag name being the name of the entity. For example:

"<INVENTOR>Thomas Eddison</INVENTOR> invented the light bulb."

NOTES:

  • You can have multiple entities in the same sentence, and you can also have different entities in the same sentence. For example:

    "<DRUG>Carvedilol</DRUG> is a medication that is used to treat <CONDITION>heart failure</CONDITION> and <CONDITION>high blood pressure</CONDITION>."

  • A good rule of thumb is to have approximately 20% of the training size as the validation size. If you had 1000 training examples, then you would add 200 validation examples.
  • The training and validation examples should be different.
  • The more examples you have (100s or 1000s vs 10s), the better your results are likely to be.
  • You should include examples where there are no matching entities. E.g. "The heart was in great condition and did not require any medication."
  • Be prepared for a few iterations of training and evaluation to improve the model.
validation

The validation examples are used by the machine learning component to acess how well the model is performing and is used to feed back into the next iteration of training.

Setup

Jibber AI looks for a file in the location /jibber_data/[LANGUAGE_CODE]/custom_entity.json

The language code should match the Docker image you intend to run. In English, that would be /jibber_data/en/custom_entity.json

A typical approach to get the custom_entity.json file in the container is:

  • Create a directory on the host machine that includes the custom_entity.json file under the correct language
  • Map the folder inside the container
  • Example: If starting the Docker container using docker run, and the custom_entity.json file is in the host folder as /jibber_host_folder/en/custom_entity.json

    docker run command
    docker run -v /jibber_host_folder:/jibber_data -d -p 5000:8000 jibberhub/jibber_extractor_en:1.0

    Replace `1.0` with the version of Jibber AI you want to run.

    If using Docker Compose, then you can map the folder in the Docker Compose folder.

    docker-compose.yml
    version: "3"
    
    services:
      jibber-service:
        image: jibberhub/jibber_extractor_en:1.0
        environment:
          - TOKEN=my-license-token-from-jibber-ai
        ports:
          - "8000:8000"
        volumes:
          - /jibber_host_folder:/jibber_data
    

    Replace `1.0` with the required version of Jibber AI.

    There are various ways to mount volumes, refer to the Docker/Docker Compose/kubernetes documentation on mounting volumes in a Docker container for more information.

Training

Jibber AI automatically looks for the custom_entity.json file on startup and checks if there's a model available matching the data in the file.

If a model is not found, it will automatically start training a model based on the contents of custom_entity.json. The model will be saved to /jibber_data/[LANGUAGE_CODE]/models

The training can take a while to run, and if there are multiple Jibber AI containers starting up at the same time (if you are load balancing for example), then run the training on one container first by starting a single container in train only mode.

When run in train only mode, the container will start up, check for a training file and train a model if required. The output model is written to disk and you can use that model when starting the containers in normal mode. The service then shuts down.

By doing it this way, you can train and test in isolation and then copy the model to the production system when you are ready.

To start the container in train only mode, you can use the following Docker command:

docker train command
docker run --rm -v /jibber_host_folder:/jibber_data jibberhub/jibber_extractor_en:1.0 python -m train

You would replace the `1.0` version with the version you require.