You can directly import existing labels so annotators can start working on pre-annotated assets. This will reduce their work complexity; they will just need to validate the pre-annotations, potentially correct a few of them, and complete the annotation process. This is easier than starting from scratch.
Refer to the example recipe:
You can also use this feature to run quality checks. For example, you can upload ground truth labels to review the annotators' work (refer to Honeypot overview).
Imported labels can be:
- Predictions from a custom model
- Predictions from a weakly-supervised learning framework
- Human-labeled data from previous projects or other sources
If you have a custom, in-house model that already detects or adds labels to your assets and the inference phase is done on your dataset, you can upload predictions to your project. For examples, refer to the Importing predictions tutorial.
If you have multiple models, you can still "tag" your predictions with the source model. Simply fill in the
modelName field in the GraphQL API. You'll then be able to filter by models when working with assets and labels.
Weak supervision is the ability to combine weak predictors to build a more robust prediction.
Here are some examples of weak predictions:
- Hard-coded heuristics: usually regular expressions (regexes)
- Syntactics: for example spaCy dependency trees
- Distant supervision: external knowledge bases
- Noisy manual labels: crowdsourcing
- External models: other models with useful signals
Weakly-supervised learning maturity depends on your task complexity. Our experience shows that it can be extremely powerful on text annotation, classification, and NER tasks.
We are used to working with Snorkel, a framework created at Standford.
With Snorkel, after defining your own pre-annotation functions, you can upload your predictions to Kili.
For examples, refer to the Weak supervision with Snorkel tutorial.
To learn more about weak supervision, refer to http://ai.stanford.edu/blog/weak-supervision.
There may be many reasons that require you to review/re-annotate human-labeled data.
- Reviewing or re-annotating an annotated dataset sourced outside
- Performing a quality check on pre-annotated datasets
- Labeling the human-generated logs from a chat bot framework.
In such cases, the import process does not change: you can still upload your predictions, assets, and existing labels into Kili.
Updated about 2 months ago