DocumentationRecipesReferenceGraphQLChangelog
Log In

Adding asset metadata

In Kili, you can add extra information to an asset by using asset metadata. This can be information on document language, custom quality metrics, agreement metrics and so on that you can use, for example when using Kili's advanced filters or for Optical Character Recognition.

Adding metadata to assets

You can add metadata by using our Python API. For details on how to do that, follow our Python API documentation (method name: append_many_to_dataset).

πŸ“˜

By default, all newly-added metadata is treated as strings. In some cases (for example to enable easier filtering by metadata), you can change metadata type to number. To do that, use the update_properties_in_project method. The metadata type will be converted to the float type.

Asset metadata can be a powerful tool to use when filtering assets. For more information on how to use Kili's advanced filters, refer to Filtering assets.

Filtering by asset metadata

Filtering by asset metadata

Asset metadata visible to labelers

Three specific metadata types can be used as additional information presented to labelers:

  • imageUrl
  • text
  • url

πŸ“˜

If you’re using a cloud service to host the images used for asset metadata, check if your cloud CORS settings are configured properly. If the CORS settings are misconfigured, the images will not show on screen.

Refer to this example code:

Adding OCR metadata to assets

For examples of how to import OCR metadata, refer to:

Importing OCR metadata through Kili SDK when creating image assets

json_metadata = {
  "fullTextAnnotation": { "pages": [{ "height": 914, "width": 813 }] },
  "textAnnotations": [
    {
      "description": "7SB75",
      "boundingPoly": {
        "vertices": [
          { "x": 536, "y": 259 },
          { "x": 529, "y": 514 },
          { "x": 449, "y": 512 },
          { "x": 456, "y": 257 }
        ]
      }
    },
    ...
  ]
}
kili.append_many_to_dataset(
  project_id='xxx',
  content_array=['url'],
  external_id_array=['A document'],
  json_metadata_array=[json_metadata]
)

Importing OCR metadata through API when updating image assets

json_metadata = {
  "fullTextAnnotation": { "pages": [{ "height": 914, "width": 813 }] },
  "textAnnotations": [
    {
      "description": "7SB75",
      "boundingPoly": {
        "vertices": [
          { "x": 536, "y": 259 },
          { "x": 529, "y": 514 },
          { "x": 449, "y": 512 },
          { "x": 456, "y": 257 }
        ]
      }
    },
    ...
  ]
}
# OR
json_metadata = {
  "ocrMetadata": "url_to_json_metadata_object_with_keys_fullTextAnnotation_and_textAnnotations",
  "key": "value", # Other metadata fields
  "key2": "value3" # Other metadata fields
}
# OR
json_metadata = "url_to_json_metadata_object_with_keys_fullTextAnnotation_and_textAnnotations"

kili.update_properties_in_assets(
  asset_ids=['asset_id'],
  json_metadatas=[json_metadata]
)

Importing OCR metadata for PDF documents

The metadata format for uploading OCR to PDF documents is similar to the one of images. Here is an example:

json_metadata = {
  "fullTextAnnotation": {
    "0": { "pages": [{ "height": 914, "width": 813 }] },
    "1": { "pages": [{ "height": 914, "width": 813 }] }
  },
  "textAnnotations": {
    "0": [
      {
        "description": "7SB75",
        "boundingPoly": {
          "vertices": [
            { "x": 536, "y": 259 },
            { "x": 529, "y": 514 },
            { "x": 449, "y": 512 },
            { "x": 456, "y": 257 }
          ]
        }
      }
    ],
    "1": [
      {
        "description": "XHE",
        "boundingPoly": {
          "vertices": [
            { "x": 536, "y": 259 },
            { "x": 529, "y": 514 },
            { "x": 449, "y": 512 },
            { "x": 456, "y": 257 }
          ]
        }
      }
    ]
  }
}

kili.update_properties_in_assets(
  asset_ids=['asset_id'],
  json_metadatas=[json_metadata]
)

As you can see, fullTextAnnotation and textAnnotations are now objects whose keys are the index of the page where the metadata should apply ("0" for the first page, "1" for the second page, etc.)

Learn more

For an end-to-end example of how to programmatically add assets, asset metadata, and asset pre-annotations to a project using Kili's Python SDK, refer to our Importing assets and labels tutorial.