Custom PII

Overview

Jibber AI currently supports 42 different types of PII. But you may want to add your own.

You add new PII types by defining them within a json configuration file.

NOTE: This feature is only available with the Docker solution and is not available with Rapid API.

Example Custom PII File

In the custom_pii.json file below, you will find 3 examples of extracting time from text:

  • GENERAL_TIME_CONF_ON_PROXIMITY looks for a time pattern and sets the confidence to medium if it's near time or clock. If not near time or clock then the confidence is set to low.
  • GENERAL_TIME_MUST_BE_NEAR_KEYWORD looks for a time pattern and will return a match ONLY if it's near time or clock. If not near time or clock, then it will not be returned.
  • GENERAL_TIME_NO_KEYWORD looks for a time pattern but does not require a keyword match.
custom_pii.json
[
  {
    "name": "GENERAL_TIME_CONF_ON_PROXIMITY",
    "pattern": "\b((0[0-9]|1[0-9]|2[0-3]):[0-5][0-9])\b",
    "pattern_case_sensitive": false,
    "keywords_pattern": "\b(time|clock)\b",
    "keywords_case_sensitive": true,
    "keywords_distance": 100,
    "confidence": "medium",
    "confidence_not_near_keyword": "low"
  },
  {
    "name": "GENERAL_TIME_MUST_BE_NEAR_KEYWORD",
    "pattern": "\b((0[0-9]|1[0-9]|2[0-3]):[0-5][0-9])\b",
    "pattern_case_sensitive": false,
    "pattern_must_be_near_keyword": true,
    "keywords_pattern": "\b(time|clock)\b",
    "keywords_case_sensitive": true,
    "keywords_distance": 100,
    "confidence": "high"
  },
  {
    "name": "GENERAL_TIME_NO_KEYWORD",
    "pattern": "\b((0[0-9]|1[0-9]|2[0-3]):[0-5][0-9])\b",
    "pattern_case_sensitive": false,
    "confidence": "high"
  }
]

JSON File Properties

The JSON file must conform to the following rules:

  • It must be a valid JSON file
  • The top level must be an array (the entires must be inside array brackets []), even if there is only one custom PII entry

The table below describes the various settings:

name
The name of the custom rule. This should not include spaces.
pattern
The regular expression to match the PII data. Only the top level regular expression groups are returned. You could use something like https://regex101.com/ to verify the pattern before you use it here.
pattern_case_sensitive
Boolean value indicating if the pattern is case sensitive.
pattern_must_be_near_keyword
Boolean value indicating if the pattern must be near a keyword in order to return the match.
keywords_pattern
The regular expression to match the PII keywords. You can match multiple keywords using the or | operator. You could use something like https://regex101.com/ to verify the keyword pattern before you use it here.
keywords_case_sensitive
Boolean value indicating if the keyword pattern is case sensitive.
keywords_distance
Integer value indicating the maximum proximity distance in characters betweek a pattern match and a keyword.
confidence
The default confidence level if a match is found.
confidence_not_near_keyword
The confidence level to return is a match is found, but is not near a keyword and the pattern_must_be_near_keyword setting is absent or set to false.

Setup

Jibber AI looks for a file in the location /jibber_data/[LANGUAGE_CODE]/custom_pii.json

The language code should match the Docker image you intend to run. In English, that would be /jibber_data/en/custom_pii.json

A typical approach to get the custom_pii.json file in the container is:

  • Create a directory on the host machine that includes the custom_pii.json file under the correct language
  • Map the folder inside the container
  • Example: If starting the Docker container using docker run, and the custom_pii.json file is in the host folder as /jibber_host_folder/en/custom_pii.json

    docker run command
    docker run -v /jibber_host_folder:/jibber_data -d -p 5000:8000 jibberhub/jibber_extractor_en:1.0

    Replace `1.0` with the version of Jibber AI you want to run.

    If using Docker Compose, then you can map the folder in the Docker Compose folder.

    docker-compose.yml
    version: "3"
    
    services:
      jibber-service:
        image: jibberhub/jibber_extractor_en:1.0
        environment:
          - TOKEN=my-license-token-from-jibber-ai
        ports:
          - "8000:8000"
        volumes:
          - /jibber_host_folder:/jibber_data
    

    Replace `1.0` with the version of Jibber AI you want to run.

    There are various ways to mount volumes, refer to the Docker/Docker compose/kubernetes documentation on mounting volumes in a Docker container for more information.