Kili AutoML repository
Kili AutoML is a lightweight library designed to train ML models in a Data-centric AI way using the Kili platform. The standard Kili AutoML workflow is as follows:
- Label your assets in the Kili App
- Train a model with AutoML and evaluate its performance in one line of code
- Push predictions to Kili to accelerate the labeling in one line of code
- Prioritize labeling in Kili to label the data that will improve your model the most first
Iterate this workflow until you are satisfied with the performance.
- Kili AutoML only works on Linux and Mac OS X.
- For mode detailed information and specific code snippets, refer to the Readme file located in the Kili AutoML GitHub repository.
- For Kili AutoML usage samples, refer to AutoML usage samples.
ML Tasks supported by AutoML
AutoML currently supports the following tasks:
- Natural Language Processing (NLP)
- Named Entity Recognition
- Text Classification
- Image
- Object detection
- Image Classification
- Semantic Segmentation
ML Backends used by Kili AutoML
Here are the supported ML backends and the tasks they are used for:
- Hugging Face (NER, Text Classification)
- YOLOv5 (Object Detection)
- Detectron2 (Semantic Segmentation)
- For a full list of supported tasks, refer to ML Tasks supported by AutoML.
- For NLP tasks like NER or Text Classification you can use any Fill-Mask model on the HuggingFace Hub. For some of the models you'll have to install additional components.
Training a model with Kili AutoML
By default, the Kili AutoML library uses Weights and Biases to track the training and the quality of the predictions.
In the training phase, Kili autoML does the following:
- Selects the models related to the tasks declared in the project ontology
- Retrieves asset data from Kili and converts it into the proper input format for each model
- Fine-tunes the model on the input data
- Outputs the model metrics
For a list of supported ML backends and the tasks they are used for, refer to ML Backends used by Kili AutoML.
- Periodically compute model loss to infer when you can stop labeling.
- For mode detailed information and specific code snippets, refer to the Model training section of the Kili AutoML GitHub repository readme.
Pushing predictions to Kili
Pre-trained models are used to predict the labels, and add preannotations on the assets that have not yet been labeled by the annotators. The annotators can then validate or correct the preannotations in the Kili App user interface.
Using trained models to push pre-annotations onto unlabeled assets typically speeds up labeling by 10%.
- You can reuse a model from another project, if both projects models have the same ontology.
- For mode detailed information and specific code snippets, refer to the Pushing predictions section of the Kili AutoML GitHub repository readme.
Prioritizing assets based on Kili AutoML recommendations
After roughly 100 assets in a project have been labeled, you can prioritize project assets that remain to be labeled in a way that will best improve the performance of the model.
To do this, AutoML uses a mix of diversity sampling and uncertainty sampling.
For mode detailed information and specific code snippets, refer to the Prioritizing assets section of the Kili AutoML GitHub repository readme.
Kili AutoML usage samples
You can test the features of AutoML with these notebooks:
- Natural Language Processing (NLP)
- Image
Updated 6 months ago