Optical character recognition
Optical character recognition overview
Kili Technology provides an interface for automatic transcription (or optical character recognition (OCR)) to extract text information from:
- an image document (for example, a scanned pdf document)
- a pdf document
To annotate an OCR job:
- Select the class.
- Draw a bounding box on the area to be transcribed.
You need to pre-process your image with OCR so that when you draw a bounding box, the text is automatically extracted. To do so, upload OCR metadata to the asset, using the
jsonMetadata
field.The metadata structure is similar to the one produced by Google APIs.
Refer to an example tutorial on creating OCR annotations using Google vision api.
For examples of how to import metadata, refer to:
- Importing OCR metadata through API when creating image assets
- Importing OCR metadata through API when updating image assets
- Importing OCR metadata for pdf documents
Importing OCR metadata through API when creating image assets
Here we detail the different ways you can upload OCR metadata to an asset.
You can upload this metadata when creating assets:
json_metadata = {
"fullTextAnnotation": { "pages": [{ "height": 914, "width": 813 }] },
"textAnnotations": [
{
"description": "7SB75",
"boundingPoly": {
"vertices": [
{ "x": 536, "y": 259 },
{ "x": 529, "y": 514 },
{ "x": 449, "y": 512 },
{ "x": 456, "y": 257 }
]
}
},
...
]
}
kili.append_many_to_dataset(
project_id='xxx',
content_array=['url'],
external_id_array=['A document'],
json_metadata_array=[json_metadata]
)
Importing OCR metadata through API when updating image assets
json_metadata = {
"fullTextAnnotation": { "pages": [{ "height": 914, "width": 813 }] },
"textAnnotations": [
{
"description": "7SB75",
"boundingPoly": {
"vertices": [
{ "x": 536, "y": 259 },
{ "x": 529, "y": 514 },
{ "x": 449, "y": 512 },
{ "x": 456, "y": 257 }
]
}
},
...
]
}
# OR
json_metadata = {
"ocrMetadata": "url_to_json_metadata_object_with_keys_fullTextAnnotation_and_textAnnotations",
"key": "value", # Other metadata fields
"key2": "value3" # Other metadata fields
}
# OR
json_metadata = "url_to_json_metadata_object_with_keys_fullTextAnnotation_and_textAnnotations"
kili.update_properties_in_assets(
asset_ids=['asset_id'],
json_metadatas=[json_metadata]
)
Importing OCR metadata for PDF documents
The metadata format for uploading OCR to PDF documents is similar to the one of images. Here is an example:
json_metadata = {
"fullTextAnnotation": {
"0": { "pages": [{ "height": 914, "width": 813 }] },
"1": { "pages": [{ "height": 914, "width": 813 }] }
},
"textAnnotations": {
"0": [
{
"description": "7SB75",
"boundingPoly": {
"vertices": [
{ "x": 536, "y": 259 },
{ "x": 529, "y": 514 },
{ "x": 449, "y": 512 },
{ "x": 456, "y": 257 }
]
}
}
],
"1": [
{
"description": "XHE",
"boundingPoly": {
"vertices": [
{ "x": 536, "y": 259 },
{ "x": 529, "y": 514 },
{ "x": 449, "y": 512 },
{ "x": 456, "y": 257 }
]
}
}
]
}
}
kili.update_properties_in_assets(
asset_ids=['asset_id'],
json_metadatas=[json_metadata]
)
As you can see, fullTextAnnotation
and textAnnotations
are now objects whose keys are the index of the page where the metadata should apply ("0" for the first page, "1" for the second page, etc.)
Updated 24 days ago