DocumentationRecipesReferenceChangelog
Log In
Documentation

Using optical character recognition (OCR)

Kili Technology provides an interface for automatic transcription (or optical character recognition (OCR)) to extract text information from:

  • an image document (for example, a scanned PDF document)
  • a PDF document

Interface requirement

OCR results are captured and stored in bounding box transcription subjobs.

Make sure your project ontology / JSON interface is properly configured:

  • Add a transcription subjob to your bounding box tool
  • Ensure the autofill option is set to true (enabled by default)

This is required for OCR to automatically populate and refresh transcriptions based on the underlying content.

How OCR works

Kili natively extracts text from image and PDF assets. When a user draws a bounding box over a text area, the transcription is automatically populated based on the underlying content — no pre-processing required.

For PDF assets, native text is used directly when available. For scanned PDFs and image assets, text is extracted from the visual content.

Using your own OCR metadata

If native extraction does not meet your needs — for example, if you have higher-quality OCR results from a specialized model — you can import your own pre-computed OCR metadata instead.

To do so, include your OCR results in the jsonMetadata field when importing your assets. The expected format follows the structure produced by the Google Vision API. Each detected text element must be associated with coordinates in the image so Kili can match bounding boxes with the correct transcription.

For more details, see Importing asset metadata and this tutorial.

Creating OCR Annotations

OCR annotations allow you to extract and edit text from images using bounding boxes with transcription.

To create an OCR annotation:

  1. Select the bounding box tool configured with OCR transcription
  2. Draw a bounding box around the text you want to capture
  3. If autofill is enabled, the transcription will be automatically populated
  4. You can manually edit the transcription if needed

Refreshing OCR Transcriptions

When working with OCR annotations, you may need to adjust bounding boxes after their initial creation (e.g. moving or resizing them). In such cases, the existing transcription may no longer match the updated area.

You can refresh the OCR transcription to reflect the updated bounding box content.

How to refresh OCR

  1. Select one or multiple bounding boxes with OCR transcription
  2. Right-click on the selection
  3. Click "Refresh OCR"

The transcription will be recalculated based on the current position and size of each bounding box.