Using optical character recognition
Kili Technology provides an interface for automatic transcription (or optical character recognition (OCR)) to extract text information from:
- an image document (for example, a scanned pdf document)
- a pdf document
Interface requirement
OCR results are captured and stored in bounding box transcription subjobs.
Make sure your project ontology / JSON interface is properly configured:
- Add a transcription subjob to your bounding box tool
- Ensure the
autofilloption is set totrue(enabled by default)
This is required for OCR to automatically populate and refresh transcriptions based on the underlying metadata.
Prerequisite: Provide OCR Metadata
To enable OCR in Kili, your images must be pre-processed and enriched with OCR metadata.
When OCR metadata is available, drawing a bounding box will automatically extract the corresponding text.
How it works
- OCR is not performed directly in the interface
- Instead, it relies on pre-computed OCR results attached to each asset
- These results must be uploaded as metadata using the
jsonMetadatafield
Metadata format
The expected metadata structure is similar to the one produced by Google Vision API.
- Each detected text element is associated with coordinates in the image
- This allows Kili to match bounding boxes with the correct transcription
You can refer to this tutorial for an example
Importing OCR metadata
When importing your assets:
- Include OCR results in the
jsonMetadatafield - Ensure the format matches the expected structure
For more details, see Importing asset metadata
Creating OCR Annotations
OCR annotations allow you to extract and edit text from images using bounding boxes with transcription.
To create an OCR annotation:
- Select the bounding box tool configured with OCR transcription
- Draw a bounding box around the text you want to capture
- If autofill is enabled, the transcription will be automatically populated
- You can manually edit the transcription if needed
Refreshing OCR Transcriptions
When working with OCR annotations, you may need to adjust bounding boxes after their initial creation (e.g. moving or resizing them). In such cases, the existing transcription may no longer match the updated area.
You can now refresh the OCR transcription to reflect the updated bounding box content.
How to refresh OCR
- Select one or multiple bounding boxes with OCR transcription
- Right-click on the selection
- Click "Refresh OCR"
The transcription will be recalculated based on the current position and size of each bounding box.
Refresh OCR is currently available for Image projects only. Support for PDF documents will be added soon.
Updated 10 days ago