DocumentationRecipesReferenceGraphQLChangelog
Log In

Python SDK tutorials

To help you with your labeling tasks, Kili offers advanced tutorials prepared in Jupyter notebook format and hosted on Github.
The ones we selected to showcase here range from basics through more advanced concepts to specialized cases.
For a full list, visit the kili-python-sdk repository.
If, rather than extensive tutorials, you're looking for simple recipes to copy-paste into your projects, refer to our recipes list.


Kili basics

Creating a project

Interested in wine? Then you probably know that in wine production quality control is the primary concern. We don't want to spoil the tasty beverage by using grapes that are too young or rotten. Let's build a project to classify our grapes based on their quality.

In this tutorial, we will show how to create a mock project through Kili's API, interacting directly with the database. All objects will be modified directly in the app, allowing you to check that everything has been properly executed.

To access this tutorial, click here: Creating a project

Importing assets

In this tutorial, we will walk through the process of using Kili to import assets. The goal of this tutorial is to illustrate some basic components and concepts of Kili in a simple way.

To access this tutorial, click here: Importing assets

Importing predictions

In this tutorial, we will show how to import predictions (pre-annotations) into Kili to help annotators and accelerate the whole annotation process. The goal of this tutorial is to illustrate some basic components and concepts of Kili in a simple way, but also to dive into the actual process of iteratively developing real applications in Kili.

To access this tutorial, click here: Importing predictions

Exporting a training set

In this tutorial, we will walk through the process of using Kili to export a training. The goal of this tutorial is to illustrate some basic components and concepts of Kili in a simple way, but also to dive into the actual process of iteratively developing real applications in Kili.

To access this tutorial, click here: Exporting a training set


More advanced concepts

Importing rich-text assets

When dealing with textual data, style can convey a lot of meaning. If you annotate a long list or a legal text, displaying structured text instead of plain boring text allows your annotator to rapidly grasp patterns within the document. Our tutorial will show you how to do that.

To access this tutorial, click here: Importing rich-text assets

Importing/exporting pixel-level masks

In this tutorial, we will show you how to import/export pixel-level masks when doing semantic annotation in Kili Technology. Such projects allow you to annotate image data at pixel level.

The data we use comes from the COCO dataset.

To access this tutorial, click here: Import/export pixel-level masks

Importing OCR pre-annotations

In this tutorial we will see how to import OCR pre-annotations in Kili using Google vision API. Pre-annotating your data will allow you to gain a significant time when performing OCR using Kili.

The data we use comes from The Street View Text Dataset.

To access this tutorial, click here: Importing OCR pre-annotations

Querying useful information using Kili API

In this tutorial, we will show you how to query useful information through Kili's API, interacting directly with the database.

There are 6 different types of data you could be interested in querying, all of them highly customizable :

  • Information about your organization
  • Information about the users in your organization
  • KPIs and labeling data for different project Users
  • The whole project or its selected parts
  • Project assets
  • Last but obviously not least, the labels

To access this tutorial, click here: Querying useful information using Kili API

Using webhooks

In this tutorial, we will show how to use webhooks to monitor actions in Kili, such as a label creation. The goal of this tutorial is to illustrate some basic components and concepts of Kili in a simple way, but also to dive into the actual process of iteratively developing real applications in Kili.

To access this tutorial, click here: Using webhooks


Leveraging counterfactually augmented data to have a more robust model

This recipe is inspired by the paper Learning the Difference that Makes a Difference with Counterfactually-Augmented Data.

In this study, the authors point out the difficulty for Machine Learning models to generalize the classification rules learned, because their decision rules, described as 'spurious patterns', often miss the key elements that affects most the class of a text. They thus decided to delete what can be considered as a confusion factor, by changing the label of an asset at the same time as changing the minimum amount of words so those key-words would be much easier for the model to spot.

We'll see in this tutorial :

  • How to create a project in Kili, both for IMDB and SNLI datasets, to reproduce such a data-augmentation task, in order to improve our model, and decrease its variance when used in production with unseen data.
  • We'll also try to reproduce the results of the paper, using similar models, to show how such a technique can be of key interest while working on a text-classification task.

To access this tutorial, click here: Leveraging counterfactually augmented data to have a more robust model

Performing efficient data augmentation for production-ready NLP tasks

As algorithms grow more complex, they also grow hungry for more data, to be able to precisely learn the meaning of the sentence. In Text Classification tasks for example, the variety of words encountered allows for a much more resilient algorithm, especially when in production with a taste of real word data.

In Computer Vision, biases may play a significant role; for example, detecting a seagull is more correlated with the presence of the beach rather than the bird itself. Similarly, in NLU words can be incorrectly associated with certain class, which can lead to problems in real-world use, as examples get harder to discriminate. If automatic generation of new data can be helpful by simply increasing the number of training examples, how well is it performing against the use of more training data from the same dataset, and are there ways to efficiently generate more data?

This article is inspired by the paper entitled Learning the Difference that Makes a Difference with Counterfactually-Augmented Data.

In this study, the authors point out the difficulty for Machine Learning models to generalize the classification rules learned, because their decision rules, described as 'spurious patterns', often miss the key elements that affect most the sentiment of a text. They thus decided to apply confusion factor, by changing the label of an asset at the same time as changing the minimum amount of words, so those key words would be much easier for the model to spot.

We'll go through details of the paper for a text-classification task, and:

  • Study the impact of counterfactually-augmented data
  • Compare the efficiency and cost of such data generation technique

We'll use the IMDB sentiment analysis dataset as our data source. The dataset consists of 50k movie reviews, and the task is to classify those reviews as positive or negative.

To access this tutorial, click here: Performing efficient data augmentation for production-ready NLP tasks

Reading and uploading dicom image data

In this tutorial, we will show you how to upload medical images to Kili. We will use pydicom, a python package, to read medical data in a Dicom format.

Data used in this tutorial comes from the RSNA Pneumonia Detection Challenge hosted on Kaggle in 2018.

To access this tutorial, click here: Reading and uploading dicom image data


Did this page help you?