DocumentationRecipesReferenceGraphQLChangelogFAQ
Log In

Using optical character recognition

Kili Technology provides an interface for automatic transcription (or optical character recognition (OCR)) to extract text information from:

  • an image document (for example, a scanned pdf document)
  • a pdf document

To annotate an OCR job:

  1. Select the class.
  2. Draw a bounding box on the area to be transcribed.

🚧

You need to pre-process your image with OCR so that when you draw a bounding box, the text is automatically extracted. To do so, upload OCR metadata to the asset, using the jsonMetadata field.

The metadata structure is similar to the one produced by Google APIs.
Refer to an example tutorial on creating OCR annotations using Google vision api.

For examples of how to import metadata, refer to:

Importing OCR metadata through Kili SDK when creating image assets

json_metadata = {
  "fullTextAnnotation": { "pages": [{ "height": 914, "width": 813 }] },
  "textAnnotations": [
    {
      "description": "7SB75",
      "boundingPoly": {
        "vertices": [
          { "x": 536, "y": 259 },
          { "x": 529, "y": 514 },
          { "x": 449, "y": 512 },
          { "x": 456, "y": 257 }
        ]
      }
    },
    ...
  ]
}
kili.append_many_to_dataset(
  project_id='xxx',
  content_array=['url'],
  external_id_array=['A document'],
  json_metadata_array=[json_metadata]
)

Importing OCR metadata through API when updating image assets

json_metadata = {
  "fullTextAnnotation": { "pages": [{ "height": 914, "width": 813 }] },
  "textAnnotations": [
    {
      "description": "7SB75",
      "boundingPoly": {
        "vertices": [
          { "x": 536, "y": 259 },
          { "x": 529, "y": 514 },
          { "x": 449, "y": 512 },
          { "x": 456, "y": 257 }
        ]
      }
    },
    ...
  ]
}
# OR
json_metadata = {
  "ocrMetadata": "url_to_json_metadata_object_with_keys_fullTextAnnotation_and_textAnnotations",
  "key": "value", # Other metadata fields
  "key2": "value3" # Other metadata fields
}
# OR
json_metadata = "url_to_json_metadata_object_with_keys_fullTextAnnotation_and_textAnnotations"

kili.update_properties_in_assets(
  asset_ids=['asset_id'],
  json_metadatas=[json_metadata]
)

Importing OCR metadata for PDF documents

The metadata format for uploading OCR to PDF documents is similar to the one of images. Here is an example:

json_metadata = {
  "fullTextAnnotation": {
    "0": { "pages": [{ "height": 914, "width": 813 }] },
    "1": { "pages": [{ "height": 914, "width": 813 }] }
  },
  "textAnnotations": {
    "0": [
      {
        "description": "7SB75",
        "boundingPoly": {
          "vertices": [
            { "x": 536, "y": 259 },
            { "x": 529, "y": 514 },
            { "x": 449, "y": 512 },
            { "x": 456, "y": 257 }
          ]
        }
      }
    ],
    "1": [
      {
        "description": "XHE",
        "boundingPoly": {
          "vertices": [
            { "x": 536, "y": 259 },
            { "x": 529, "y": 514 },
            { "x": 449, "y": 512 },
            { "x": 456, "y": 257 }
          ]
        }
      }
    ]
  }
}

kili.update_properties_in_assets(
  asset_ids=['asset_id'],
  json_metadatas=[json_metadata]
)

As you can see, fullTextAnnotation and textAnnotations are now objects whose keys are the index of the page where the metadata should apply ("0" for the first page, "1" for the second page, etc.)