Annotation methodology
Cogite applies a structured annotation methodology built on four pillars: precise guidelines, double annotation, statistical quality control, and continuous team calibration. This page documents our standards for clients who want to understand in depth how we operate.
1. Annotation guidelines
Every project starts with the joint drafting (with your ML team) of a guidelines document. This reference document covers:
- The taxonomy of classes/labels to use
- Precise definition of each class with positive and negative examples
- Documented edge cases (boundary cases, ambiguities)
- Format conventions (JSON, COCO, Pascal VOC, JSONL)
- Quantitative quality criteria (minimum IoU, target F1-score, etc.)
2. Systematic double annotation
On all our Production-plan projects, every sample is annotated independently by two annotators. Divergent annotations are arbitrated by a third senior annotator. This redundancy is essential for final quality and allows measuring inter-annotator agreement.
3. Quality metrics
We systematically measure:
- Inter-Annotator Agreement (IAA) via Cohen's kappa coefficient (target: κ > 0.85)
- Average IoU on bounding boxes and segmentation masks
- Precision/recall on classifications
- Average time per annotation (efficiency)
- Error rate detected in QA review
These metrics are communicated in a weekly report sent to your ML team.
4. Continuous calibration
Every week, our annotators participate in calibration sessions where they discuss difficult cases encountered. These sessions are led by the AI project manager and allow refining collective understanding of the guidelines. It's also an opportunity to flag any ambiguity requiring client clarification.
Tools used
We work with the leading annotation tools on the market:
- Label Studio (open source) — our default tool
- CVAT — for video and 3D segmentation
- Labelbox, V7, Encord — on client request
- Client proprietary tools — we adapt to your platform
Delivery formats
We deliver in your preferred format: JSON, COCO, Pascal VOC, JSONL, YOLO, MS COCO Keypoints, CSV, or any proprietary format specified in the brief. Deliveries include a dataset card documenting composition, quality statistics and conventions used.