DocumentationRecipesReferenceGraphQLChangelog
Log In

Kili AutoML repository

Kili AutoML is a lightweight library designed to train ML models in a Data-centric AI way using the Kili platform. The standard Kili AutoML workflow is as follows:

  1. Label your assets in the Kili App
  2. Train a model with AutoML and evaluate its performance in one line of code
  3. Push predictions to Kili to accelerate the labeling in one line of code
  4. Prioritize labeling in Kili to label the data that will improve your model the most first

Iterate this workflow until you are satisfied with the performance.

📘

  • Kili AutoML only works on Linux and Mac OS X.
  • For mode detailed information and specific code snippets, refer to the Readme file located in the Kili AutoML GitHub repository.
  • For Kili AutoML usage samples, refer to AutoML usage samples.

ML Tasks supported by AutoML

AutoML currently supports the following tasks:

  • Natural Language Processing (NLP)
    • Named Entity Recognition
    • Text Classification
  • Image
    • Object detection
    • Image Classification
    • Semantic Segmentation

ML Backends used by Kili AutoML

Here are the supported ML backends and the tasks they are used for:

  • Hugging Face (NER, Text Classification)
  • YOLOv5 (Object Detection)
  • Detectron2 (Semantic Segmentation)

📘

  • For a full list of supported tasks, refer to ML Tasks supported by AutoML.
  • For NLP tasks like NER or Text Classification you can use any Fill-Mask model on the HuggingFace Hub. For some of the models you'll have to install additional components.

Training a model with Kili AutoML

By default, the Kili AutoML library uses Weights and Biases to track the training and the quality of the predictions.

In the training phase, Kili autoML does the following:

  • Selects the models related to the tasks declared in the project ontology
  • Retrieves asset data from Kili and converts it into the proper input format for each model
  • Fine-tunes the model on the input data
  • Outputs the model metrics

For a list of supported ML backends and the tasks they are used for, refer to ML Backends used by Kili AutoML.

📘

  • Periodically compute model loss to infer when you can stop labeling.
  • For mode detailed information and specific code snippets, refer to the Model training section of the Kili AutoML GitHub repository readme.

Pushing predictions to Kili

Pre-trained models are used to predict the labels, and add preannotations on the assets that have not yet been labeled by the annotators. The annotators can then validate or correct the preannotations in the Kili App user interface.

Using trained models to push pre-annotations onto unlabeled assets typically speeds up labeling by 10%.

📘

  • You can reuse a model from another project, if both projects models have the same ontology.
  • For mode detailed information and specific code snippets, refer to the Pushing predictions section of the Kili AutoML GitHub repository readme.

Prioritizing assets based on Kili AutoML recommendations

After roughly 100 assets in a project have been labeled, you can prioritize project assets that remain to be labeled in a way that will best improve the performance of the model.

To do this, AutoML uses a mix of diversity sampling and uncertainty sampling.

📘

For mode detailed information and specific code snippets, refer to the Prioritizing assets section of the Kili AutoML GitHub repository readme.

Kili AutoML usage samples

You can test the features of AutoML with these notebooks: